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Chapter I 
iNTRODUaiON 

Recently conpleted research by Schalock, Beaird and SlnBona (1964) 
on Che predictive power of tests which use motion pictures as test 
stimuli suggests that a methodology may now be at hand which will permit 
the prediction of teaching behavior in the classroom. Using student 
teachers as subjects, Schalock et i^. were able to demonstrate multiple 
correlations of .69 to .87 between scores on a battery of situational- 
response tests (tests which use motion picture representations of class- 
room situations as test stimuli) administered prior to student teaching 
and observational measures of their behavior in the classroom during 
student teaching. This represents an unusual accomplishment, for typi- 
studies in the behavioral sciences have not been able to account 
for more than 50 per cent of the variance in any criterion that has been 
predicted to, and when the criterion has been as complex as teaching 
behavior the level of prediction has nearly always been less. In the 
Schalock, Beaird and Simmons study at least 50 per cent of the variance 
was accounted for in each of the 15 separate criterion measures used 
(concrete behavior of teachers in . the classroom) and as much as 75 
per cent of the variance was accounted for in some. 

Unfortunately, several factors tend to temper the confidence that 
can be placed in the findings that came from the study. First, a small 
N (40) coupled with a relatively large nuiber of predictor variables (18) 
could have led to the multiple correlations being spuriously high. 



CroBbtch (1960) hu warned that validity shrinkage ia likely to be great 
fro« one study to another when nany predictors are tried and when weights 

detemined fron snail sanples* Dunn (1959) has gone so far as to 
say that aultipla regressicu aethodology, the strategy of analysis used 
in the Schalock» Beaird and Sissnons study » is not a particularly reliable 
nethodology. She attenpted to predict to choice of field of study and 
success in it, using grades as criteria of success, and found that for 
her first sanple (N«approximately 500) nultiple R*s ranged from .416 to 
.914, but in a cross-validation group, using the sane predictors to the 
same measures, found correlations of -.433 to .160. These data, in 
combination with the large number of predictors and the relatively small 
number of subjects used in the Schalock e£ al. study, make the correla- 
tions coining from it suspect. In defense of the study, however, the 
number of predictors never exceeded N/2, a commonly applied rule of 
thumb in studies of this kind. 

A second factor that leads to a tendering of confidence in the data 
stems from the somewhat unorthodox analyses applied to it. Seventy-five 
regression analyses were run, 15 (one for each of the criterion measures 
employed in the study) using total test scores from each of the four 
instruments employed in the prediction battery, 45 using the subscale 
scores found within three of these instruments (the four tests used 
in the study were made up of 1, 7, 11 and 12 subscales respectively), 
and 15 using a combination of the best predictors from -ach of the four 
teats in the battery as these were identified in the subscale analyses. 
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While the full renge of date were reported for the verioue enalyee. (eee 
Chapter II). there la ao«e question as to what to uke of them. Thara is 
alao so»e question as to the appropriateness of the procedure used in 
selecting the best of the subscale predictors for inclusion in the final 
set of regression analyses. Subscales were selected on the basis of per 
cent of criterion variance accounted for and it may have been .ore appro- 
priate to select on the basis of the correlation of subscales with 
the criterion measures and other aubscales. In any event, either or 
both of these factors could hav'- caused spuriously high correlations 

to appear between predictor and criterion measures. 

In contrast to the sources of error in the data that could have 
given rise to spuriously high correlations two sources of error could 
have acted to reduce the magnitude of the correlations. The first of 
these derives from the fact that the measures used in it were "pr^wlypi®" 
in nature. This was the case for both the predictor and criterion meas- 
ures. for both were first generation in their development and representa- 
tive of relatively unexplored approaches to measurement.^ As such the 
conceptual framework which guided item development in the predictor and 
criterion measures was relatively primitive, the filmed episodes around 
which the predictive Instruments were built were relatively weak, 
the item analyses used in their development were based on responses 
of experienced teachers whereas the instruments were subsequently 



l^he Schalock. Bealrd and Simmons study was actually desig^ aa 
a validation study of the three situational response 

a vaxxoauiou / observation ayatem from which the criter- 
ion m«s«~ war! deriiedVo^^^^^ within the context of the at^y. 
Both aeta of meaaurea are described in the next section of the repor 
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uied with inexperienced teachers > and the obaerwation ayate* used 
in obtaining the criterion aeaauren suffered fre« reUtively low 
relUbility on the part of observers applying it. In coabination 
these 11-itations led to a set of predictor and criterion .easurea 
which were more liaited in range and quality than ultlaataly desired. 

The second source of error in the study that could have acted to 
reduce the magnitude of the correlations found was the failure of the 
researchers to control for situational factors that interact with or 
are thought to influence teaching behavior in the claasrooa. Factors 
such as unplanned events, composition of the class, physical conditions 
within the classroom and the nature of the activity in which teacher 
and learners engage were not controlled, and since these are likely 
to be significant determinants of teaching behavior their omlasion 
or neglect should have reduced still further the magnitude of the 

correlations found in the study# 

In light of these kinds of limitations in meaaureaent it is 
remarkable that correlations of the magnitude demonstrated were obtained. 

Given the data that derived from the study, and the many potential 
sources of error that accompanied them, a proposal was submitted 
immediately upon the completion of the study to the O.S. Office of 
Education for its replication and extension. Three factors led to 
the second proposal; (1) the essentially unprecedented results obtained 
in the patent study, (2) the numerous potential or real sources of 
error in it, and (3) the desire to avoid the pitfalls of uncritical 
teat adoption, that is, the desire to forestall the users of teats from 

4 
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noviog too quickly to adopt the instruBenta dtveloptd in the atudy for 
uae in their own programa of research or evaluation. Since these 
instruments were new, and since the first predictive efforts with 
them were so premising, there was danger that the measures might be 
applied in areas where baaia for their application did not exist. 
Cronbach (1950) states this danger well: 

When an investigator has once obtained a satisfactory 
validity coefficient he tends to install his program and 
stop research. Other workers, reading his report of the 
study, accept his teat as valid and put it to work in their 
own situations. This practice is unsound. In the first 
place, any validation result is influenced by. chance, and 
correlations will fluctuate from sample to sample. Conse- 
quently the test which proves best in one sample may not 
prove to be the best predictor in another similar sample. 

Even when the results are based on a large sample the 
P*^fticular score or the particular weights most effective 
in a multiple correlation are certain to change when a new 
group is tested* If the same formula is applied to other 
groups, correlation is sure to drop. Moreover, the supply 
of men and the conditions of training change according to 
time. It follows that the investigator must redetermine 
the validity of his prediction technique periodically. 



Four major objectives guided the present study: 

(1) to replicate the parent study; 

(2) to extend the design of the parent study to experienced, 
primary grade teachers; 

(3) to strengthen both replication studies by increasing the 
number of subjects used in each and including in them 
measures of situational variables that affect predictive 
accuracy; and 
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(A) to investigate the effects on prediction of deriving 
criterion measures from behavioral samples of varying 
lengths. 

The rationale for objectives (1) and (3) has already been spelled 
out. The rationale for objective (2) was twofold: a) the desirability 

of testing the power of the predictive measures with a variety of teacher 
populations, and b) the theoretically based expectancy that the situa- 
tional^response tests would predict the behavior of experienced teachers 
better than they would student teachers because of the wider background 
of experience they can draw upon in interpreting the situation before 
responding to it and because the items in the tests were validated 
initially against a population of experienced teachers. The rationale 
for objective (A) was simply that the systematic study of behavioral 
sampling and its relation to the stability of measures dependent upon 
it is long overdue. Observational methodology, especially as it applies 
to prediction in situation , is inescapably dependent upon behavioral 
sampling yet there has been no research to date to suggest clearly 
the nature of the sample needed to maximize prediction. VHiile the 
present study did not permit an exhaustive investigation of the issue 
(length of behavioral samples were limited to one, two and three hours) , 
it was hoped that it would provide a point of departure for subsequent 
work. 

In passing it should be pointed out that the investigation of 
situational measures and their relationship to the predictability of • 
behavior in situation was also exploratory in nature, with situational 
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acMuret being United' to rather groee deecriptione of cleeerooa etruc- 
ture and conpoeition, event! end the orientation of achool adniniatratora 
toward claaeroom nanageaent. Much the saae point of view underlaid 
effort as underlaid the etudy of behavioral aeapling: J great deal haa 

been written about the aignificance of aituational variablea in reaearch 
deaign. but aa yet no one has gotten serious about their aeaaureaent. 

It was hoped that the present effort would represent a start in that 

direction. 

A fifth objective evolved es the study progressed, nsmely, to 
strengthen the criterion aeasures used in it. This required extensive 
work on the observation systea developed in the parent study. «id led 
in psrt to a request for a 6-nonths extension of the study. A by 
product of this extension is the acconpanying monograph (see Attachasnt 
1) that provides an overview of the observational system that derived 
from the effort. The system is referred to generally as the Teachi ng 
Research System for the PescriEtlon ofTeachlM Mail" ia £2^ 
and provides the most exhaustive measure of teaching behavior presently 
avalUble. As such, its development represents one of the major 
contributions of the project. 

The one major source of error inherent in the parent study that 
could not be reduced in the replication study was that attributable to 
the quality of the predictor measures; they had to remain unchanged. 

Because the research to be reported ties so closely to the Schalock 
Beaird and Simmons study, the next chapter in the report is devoted to 

its review. 
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Chapter 11 

AN OVERVIEW OF THE SCHALOCK, BEAIRD AND SIMMONS STUDY 



A« indicted previously, the Schelock, Beeird end SUmont study use 
intended ee e velidstlon study of eltustlonel-reeponse tests which used 
■otlon picture sequences of clsssrooa behsvior ss test sttauli. The 
genersl hypothesis underlying the study wss thst In order to predict to 
eoaplex hussn behavior the tests to be used ss predictors bed to reflect 
In their cosposltlon the complexity of the behsvior to be predicted. 
Speclficslly. the hypothesis tested in the study wss thst ss test stimuli 
Incressed in their representstlveness of the behsvior to be predicted, 
end ss the opportunity for response to those stimuli spprosched life- 
likeness" In their freedom, the predictive power of tests would Increase 
sccordlngly. Motion picture sequences of clsssrooa behsvior were used 
in so effort to provide s stimulus situation comparable in complexity 
to that Involved in real life teaching. 



The Predictor Measures 



Four predictor tests, varying on a continuum of stimulus and response 
complexity, were used In the study: 1) s trsdltlonsl psper-aod-pencll 

attitude scale, where the test stimulus was a statement describing an 
orientation to the teaching function and response wss dsfined by agree- 
ment or disagreement to the statement (The Minnesota Teacher Attitude 
Inventory) , 2) a situational-response test where the test stlaull were 
written descriptions of filmed classroom situations and response was 
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defined by agreement or disagreement to atatementa made in relation to 
situational descriptions CThe Word Test) • 3) a altuational*reaponae 
test where the test stimuli were notion picture sequences of classroom 
situations and response was defined as in (2) above (The Film Test)f and 
4) a situational-response test where the test stimuli were also motion 
picture sequences of classroom situations but the response was free) 
i.e«, the subject responded to the filmed situation *•. if lllS. 
the teacner in the situation (The Simulation Test) . It was hypotheslxed 
that the predictive power of the tests would vary in the order of 
their listing above, with the MTAI being the weakest predictor and 
the simulation test the most powerful* The relationship of these 
tests to one another on a continuum of stimulus and response complexity 

appears as Figure 1* 



MTAI Word Test Film Test 

I 

f 
I 

SIMPLE * * L_ 



Words as Stimuli 
Fixed Response 



Simulation Test 

! COMPLEX 

Life Behavior as Stimuli 
Free Response 



Figure 1. Continuum of test stimulus and response complexity. 

The Word* Film, and Simulation Tests were constructed especially 
for the project. Generally speaking, they were designed to assess a 
teacher’s orientation to classroom management and interpersonal rela- 



9 



tionships with children. No Attempt was made to assess orientation to 
learner outcomes or the Instructional strategies pertaining to them. 
Situations portrayed in the tests were identified as particularly 
challenging and representative of these dimensions of the teaching 
process by first* second* and third grade teachers. 

Word Test . The word test consistc of 13 written descriptions of 
actual classroom situations which occurred in the first* second* and 
third grades of the Campus Elementary School (CES) at Oregon College 
of Education. Each written description of a situation is followed 
by 12 to 22 statements about the situation to which respondents agree 
or disagree on a five point scale (Strongly Disagree to Strongly Agree). 
The test provides a total score and 12 subscale scores (see Table 1). 
Split**half reliability of the various scales range from .5!^ to .94. 

Film Test . The film test consists of 13 motion picture sequences 
of actual classroom situations which occurred in the first* second* and 
third grades in CES. Each sequence is followed by 11 to 22 state- 
ments about the situation portrayed to which testees respond in the same 
manner as for the word test. The test provides a tota7i score and 11 sub- 
scale scores. Split-half reliability for the various scales range from 
.51 to .92. 

Simulation Test . The simulation test consists of 12 motion picture 
sequences of classroom events filmed in a single second grade at CES. 

The sequences are arranged chronologically to represent a single day. 

The test is accompanied by a cumulative folder detailing anecdotal 
and test information for each of the "main characters" portrayed in 
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the film sequences* Sequences were filned in such s nsnner thst when 
viewing the films the children are looking directly at the respondent. 
Respondents record t verbatim » their reactions to the situations and 
then describe (1) why they responded in the way they did» (2) what 
they hoped to attain through their response » (3) their impression 
of the child (when a single child was involved) and (4) why they re- 
sponded at the time they did. The test provides a total score and 
seven subscale scores. Scores are derived through a content analysis 
of written responses. 

Subscales and reliability estimates for the Word and Film Tests 
are presented in Table 1* Subscales for the Simulation Test are presented 
in Table 2. Because the scores of the Simulation Test subscales are 
derived from judges* ratings of respondent behavior, reliability coefficients 



Table 1. Subscales and Reliability Estimates for the Word and Film Tests 



Word Test Subscales r 

1. Management I *782 

2. Interpersonal Awareness .818 

3. Technique Awareness *702 

4. Management 11 *834 

5. Orientation to Strategies .546 

6. Orientation to Structure *864 

7. Philosophy of Structure .740 

8. Approach to Structure *730 

9. Teacher Characteristics .868 

10. General Interpretation .832 

11. Specific Interpretation .929 

12. Specific Sanction .784 

13. Total Test .940 



Film Test Subscales r 

1. Management 1 *732 

2. Interpersonal Awareness .883 

3. Technique Awareness • *656 

4. Management 11 *752 

5. Response to Deviation .651 

6. Philosophy of Structure .510 

7. Approach to Structure .718 

8. Teacher Characteristics .631 

9* General Interpretation .512 

10. Specific Interpretation .915 

11. Specific Sanction .785 

12. Total Test *915 



of Che usual nature were not determinable* Instead t inter*rater relia** 
bility was determined and was found to be consistently acceptable* 

Table 2. Subscales of the Simulation Test 



1* Management 1 Address to Individuals 

2* Interpersonal Awareness 5* Use of Questions 

3* Structure 6* Trust 

7* Academic Orientation 



The Criterion Measures 

In order to provide a rigorous test of the basic hypothesis » it 
was decided to use as criterion performance specific behavioral measures 
instead of more typically used global measures of teaching success* To 
this end systematic observational procedures were used as the primary 
data source in the study* Performance ratings » which have plagued the 
field of research on teacher effectiveness, were not used* 

The system of observations used in the study, from which the criter- 
ion measures were derived, involved both preconceived category sets and 
rating scales* Category sets were developed for the description of 
specific interactive behaviors that occurred between teacher and child, 
and rating scales were used to assess some of the more global qualities 
reflected by the teacher in the situation* Both categories and rating 
scales were designed to assess the same parameters of the teaching 
process that the predictor measures were designed to assess, namely, 
orientation to classroom management and interpersonal relationships with 
children. 

Interaction was conceptualized as following essentially a stimulus'” 

♦ 

response paradigm, where a stimulus (cue, demand) might or might not be 



responded to and a response might or might not serve as an invitation 
(stimulus) to a further response* The basic model for observation was 
a three-stage interaction sequence; (1) a stimulus (demand situation) 
operating upon the teacher within the classroom setting, (2) a response 
(or lack of response) of the teacher to the demand situation, and (3) the 
response of a child or group of children to the teacher's response. With 
this model behaviors of the teacher could be related explicitly to be- 
haviors of children in her class* In turn some child behavior could be 
related to behaviors of the teacher. The model also permitted recording 
of interaction between teacher and child that continued over time, i*e*, 
where there were more than three exchanges in the interaction sequence* 
The categories that made up the system appear in Tables 3 and A* 

Table 3* Classes of Child behavior Descriptive of Stimulus and Response 
Conditions 



Category Set 1 

Classes of Child behavior as 
Stimuli to Teacher behavior 

1* Ignoring of group goal 

2* Intense social involvement 

3* Involvement in academic content 

A* Routine classroom functions 

3* Rule breaking behavior 

6* Conflict behavior 



Category Set 3 

Classes of Child behavior as 
Responses to Teacher Behavior 

a Acceptance 

au Unqualified Acceptance 
aq Qualified Acceptance 
ar Acceptance with Reward 

t Tending 

i Exchange of Information 
pp Postpones 
ig Ignores 
r Rejection 

ru Unqualified Rejection 
rq Qualified Rejection 
r pera Rejection Through 
Attempts at Persuasion 




T«bl« 4. Classts of Ttachcr Bthavlor 



Category Sot 2 



R Judgca behavior tiorthy of 

Unqualified reward 
Rq Qualified reward 

T Tending 

I Exchange of infomatiou 

0 Directing (non aubject natter) 

S Structuring (aubject autter) 

Sq: Direction giving (vho» 
what» where* when) 

Sh: Explaining (how) 

Si: Information giving 
Sc: Correcting 



Pp Poatponea 
Ig Ignorea 

C Judgea behavior worthy of 
change 

Cu Attempts change with 
unqualified power 

Cq Attempta change with 
qualified power 

C pera Attempts change 

through persuasion 
or suggestion 



Affect and Intensity ratings also accompanied the recording of each 
category of teacher and child behavior. This represented an effort to 
obtain a measure of the feeling tone and/or intensity of the interaction. 
Four affect measures were used: (1) warmth* intensity* exuberance; 

(2) distance* aloofness* hostility; (3) upset* concern* anxiety; and 
(4) neutrality* or a lack of any of the above. Three levels of intensity 
were used: low* moderate and high. Intensity ratings were always made 

relative to the intensity of the situation in the classroom at the time. 

In addition to the category sets nine rating scales were developed 
to measure some of the more general characteristics of teacher behavior. 
These were adapted from the scales developed by Schalock and 0*Nail (1961) 



r 



in relation to parent -child interaction, and included meaauraa of (1) Toler- 
ance for Changeworthy Behavior, (2) Warmth, (3) Reapect for the Indivi- 
duality of Children, (4) Comfortableness, (5) Intellectuality, (6) Con- 
sistency, (7) Tempo, (8) Organization, and (9) Harmony. All were rated 
on a five point scale. 

Data became available from the measurement system in the form of 
category frequency counts and ratings. One of the unique features of the 
observation system was that each category of interaction was able to be 
identified as to who initiated what kind of behavior, and what the response 
to it was ! Thus it was possible to determine not only the frequency 
with which a teacher responded to children with power, or ignored a child 
initiation, or gave help, but it was also possible to determine the kind 
or class of behavior that elicited such responses. It was also possible 
to identify such factors as the role the teacher played in the class- 
room, for example, whether she tended to be the center of things through 
lecturing, structuring, directing, etc., or whether she let the children 
assume the more active role; whether the children tended to initiate inter- 
action with her or avoid her; and what kinds of behavior she tended to 
reward, punish or ignore. 

The rating scale data tended to support and extend the basic data 
obtained through the direct behavioral measures. 

The fifteen criterion measures used in the study were derived from 
these category- and rating scale data. Three features characterized the 



criterion measures: 






1) They were theoretically relevant, i.e., they related to 
dimensions of the model of teaching behavior used as a 
guide to instrument development throughout the study, and, 
as a consequence, exhibited a close tie to the predictive 
instruments that were developed; 

2) They were complex in the sense that they represented a 
pooling of a number of conceptually related behaviors 
into a ratio or combination score. Theoretically this 
provided a more stable and comprehensive measure than 
would single classes of behavior; and 

3) The measures took, full advantage of the power of the 
observational system in the sense that they tied to 

(a) various classes of child behavior, (b) the teacher's 
response to classes of child behavior, and (c) the 
child's response to the teacher's behavior. 

So far as we know, this is the first time that observational data have 
been used in this particular way, and on a priori grounds the measures 
derived by means of this procedure are most promising. 

The measures derived from the category data appear in Table 3. 

The measures derived from the rating scale data appear in Table 6. 

The reliability of these measures will not be reviewed here, but on 
the basis of rather tenuous reliability data (see the original report) 
all measures were judged to be minimally adequate. Upon use it was 
found that the category based measures were essentially unrelated 
or independent measures (low intercorrelations) while the rating scale 
based measures were highly related. 
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Table 5. Criterion Measures Used in the Study That Were Based on Category 
Frequency Counts 



Measure 



1. Pernissive-Restrlctivc 



2. Power 



3. Consideration I 



4. Consideration II 



5. Affective Orientation of 
Teacher 



6. Teacher Success in Obtaining 
Cooperation or Compliance 



7. Teacher Approachableness 



8. Individual vs. Group Focus 



9. Teacher vs. Child Focus 



10. Directing vs. Facilitating 



11. Question vs. Statement 



Source 



1. Total Cu Cq Cpers* 
All Teacher Acts 



2. Total Cu 



Total Cu + Cq + Cpers 



3. Total Ru Rq Cq 



Total Ru + Rq + Cq + Cu + Cpers 
•f Pp + Ig 



4. Total T + I + Sh + si in 
response to child initia- 
tions in categories 2, 3 and 4 



All of the above plus all other 
teacher responses to child initia* 
tions in categories 2, 3 and 4 



5. Total (+) 

Total (+) + (-) + (/) 



6. Total child responses of au, 
1 or !*> to teacher Cu, Cq, 
Cpers, Sq, or D actions 



All of the above + all other 
child responses to these 
teacher actions. 



7. Total child category 2, 3, and 4 
entries, including questions 
(2-^, 3->, 4-^) in Flow Pattern III. 



8. Total teacher acts directed to 
group or part group 



Total Teacher Acts 



9. Total Flow Pattern III entries 



Total Teacher Acts 



10. Total Sq 



Total Sq + Shi + Sh + Sc, 
including questions, in any of 
these categories 



11. Total Si-» * Sh-» + Sq-» 
Total Si + Sh + Sq 



*Cu, Cq, Cpers, etc. are category labels used in recording (see Tables 3 and 4). 
For category definitions and examples, see Schalock, Beaird and Slnmons, 
pp. 400-434. 
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Table 6. Criterion Measures Used In the Study that were Based on 
Rating Scales 



Measure 


Source 


12. Permissive-Restrictive 


Ratings on Tolerance for 
Changeworthy Behavior Scale 


13. Consideration 


Rating on Respect for 
Individuality Scale 


14. Classroom Climate 


Ratings on Warmth and Uamony 
Scales 


15. Total Teacher 


Total of All Scale Ratings 


Characteristics 





t*rocedures 

Subjects were senior women majoring in elementary education at Oregon 
College of Education and Oregon State University who were teaching in the 
primary grades during the 1963-64 school year. A total of 56 subjects 
participated in the study, although the sample attenuated for various 
reasons to a final N of 40. 

Prior to the academic quarter during which the subjects were engaged 
in their student teaching, the four predictive measures (MTAl, Word Test, 
Film Test, and Simulation Test) were administered. The tests were admin- 
istered in group settings and in a randomized order. Six half-hour records 
of interactive behavior for each subject in the classroom, collected on 
two separate days, constituted the behavioral sample. Each day two half- 
hour observations were made in the morning and one half-hour observation 
in the afternoon. A different observer observed each subject each day. 

All observations were made during the last two weei&s of the subject s 
student teaching experience. 
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Resulti 

Three levels of regression snslyses were run; Level I relsted totsl 
test scores to eech of the 15 criterion nessuretj used in the study; Level 
II relsted the subscsle scores from each of the three situationsl response 
tests to the various criterion measures; and Level III related a combina- 
tion of subscales from the various tests that proved to effective 
predictors in the Level ^ analysis to the criterion measures. 

Results of Level 1 Analysis . Fifteen regression analyses were run, 
one for each criterion behavior, using in each case the total scores for 
the MTAI, Word Test, Film Test, and Simulation Teat as the predictor 
variables. The percent of criterion variance (R^) accounted for by 
total scores of the four predictors ranged from 8.0 to 32.6. The L test 
for ordered hypotheses (Page, 1963) was nonsignificant, failing to sub- 
stantiate the basic hypothesis. 

Results of Level II Analyses . Sixty regression analyses were run, 
one for each criterion behavior (15) for each of the four instruments 
used as predictors. For the MTAI zero order coefficients of correlation 

computed since it does not have subscales. Per cent of criterion 
variance (R^) accounted for by the MTAI ranged from zero to 5.3, with a 
mean of 1.7; per cent of criterion variance accounted for by subscales 
of the Word Test ranged from 14.9 to 50.1, with a mean of 30.9; per 
cent of criterion variance accounted for by subscales of the Film 
Test ranged from 18.7 to 49.6, with a mean percent of variance of 
38.3; and per cent of variance accounted for by Simulation Test sub- 
scales ranged from 22.9 to 54.1 with a mean of 37.6. The L test for 
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ordered hypotheses vse significant st the .003 level* substantiating 
the hypothesis of difference in predictive effectiveness as tests 
■oved toward the approxiaation of lifelikeness. It will be noted, 
however, that the hypothesis did not hold with respect to the predicted 
relationship between the Fila Test and the Siaulation Test. 

Results of Level 111 Analyses . Fifteen Level 111 regression 
analyses were aade, one for each criterion behavior. Each analysis 
utilized 18 predictor variables - the MTAl total score, the five sub- 
scales of the Word Test that proved to be the most effective predicto<:‘S 
of a given criterion aeasure, the five subscales of the Fila Test that 
were the aost effective predictors of the same criterion, and the 
seven subscales of the Simulation Test. In most cases, the subscales 
used in any given regression analysis differed from those used in 
other regression analyses. 

Per cent of criterion variance accounted for in Level 111 analyses 
ranged from 49.0 to 73.7 with a mean of 38.8. The L test for ordered 
hypotheses was significant for these data at the .03 level, with the 
Simulation Test subscales consistently outranking the other predictors 
in accounting for criterion variance. The MTAl consistently ranked last, 
with the select subscales of the Word and Film Tests accounting for 
essentially the same amount of variance in the criterion measures. 

The Multiple R's that derived from the three levels of analysis 
are presented in Table 7. On the basis of these data it was concluded 
that in general the results supported the basic hypothesis tested in 
the study, namely, that as test stimuli become more representative of 
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Table 7. Per Cent of Criterion Variance (R^) Accounted for in the 
Three Levela of Regreaaion Analysis 





Level 1 




Level 11 




Level III 




(Total Test 




(Subscale Scores) 




(Selected Subscale 


Criterion 


Scores) 


Word 


Film Simulation 


Scores) 


1 


.176 


.303 


.281 


.303 


.563 


2 


.323 


.384 


.476 


.360 


.504 


3 


.185 


.504 


.476 


.303 


.624 


4 


.123 


.314 


.397 


.436 


.624 


5 


.176 


.292 


.384 


.292 


.504 


6 


.144 


.221 


.384 


.533 


.504 


7 


.090 


.152 


.270 


.230 


.476 


8 


.137 


.348 


.292 


.384 


.593 


9 


.102 


.270 


.449 


.384 


.608 


10 


.048 


.221 


.410 


.397 


.757 


11 


.073 


.303 


.360 


.449 


.723 


12 


.084 


.325 


.436 


.384 


.548 


13 


.194 


.314 


.436 


.436 


.689 


14 


.221 


.410 


.490 


.384 


.656 


15 


.0784 


.176 


.185 


.360 


.490 



the behavior to be predicted and as the opportunity for response approaches 
the freedom characteristic of life situations the power of prediction 



increases • 
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Chapter III 

AH OVERVIEW OF THE REPLICATION STUDY 

At indicated previously, the present study represented an extension 
as veil as a replication of the parent study. Two extensions were under- 
taken: 1) the addition of situational data, that is, descriptions of 

classroom structure, composition, unplanned events, etc., to the original 
prediction scheme, and 2) the repetition of the study, including the use 
of situational measures, with experienced elementary school teachers. In 
addition, the study was designed to provide information on the effect on 
prediction of using behavioral samples of differing lengths in obtaining 
the criterion measures. In providing an overview, each of these aspects 
of the study will be described. 

The Replication Study 

Every effort was made to exactly replicate the parent study. Subjects 

were drawn from the same population, the same predictive measures were 

2 

used, criterion measures were equivalent but strengthened (see below), 
and the same analyses were applied. 

Subjects . Thirty-nine senior women, majoring in elementary education i/ 
with specialization in the primary grades at either Oregon ?!>r. te University 
or Oregon College of Education, served as subjects for the replication 

^Three of the tests used, the MTAI, the Word Test, and the Film Test, 
were completely equivalent since their administration and scoring required 
no coding or interpretation; the Simulation Test was as "equivalent as 
possible" considering its reliance upon coders for its scoring. 
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study. Subjects were drsvn from the pool of students who did their 
student tesching in Winter end Spring terns of the 1965-66 scsdeaic 
year end Fell end Winter terns of the 1966-67 scsdenie yesr. Only 
students who volunteered to take part in the study and who did their ^ 
student teaching within a 60 nile radius of Oregon State University 
were eligible for inclusion in the study. These were the same criteria 
used in the parent study » and approximately the same proportion of 
students met these criteria as they did in the parent study. No 
consistent differences appear in predictor or criterion aeasure scores 
for the students from the two institutions. 

predictor measures . The same predictor measures that were used 
in the parent study » that is» the MTAI and the Word, Film and Simulation 
Tests, were also used in the replication study. They were administered 
in group settings at the close of the term that preceded the term in 
which the subjects did their student teaching. In contrast to the 
parent study, however, a totally random order of test presentation was 
not followed. The four tests required approximately five hours to com- 
plete, and it turned out to be impossible to get all subjects to arrange 
for a block of time of that length. It was possible to get everyone to 
arrange for a half a day of testing, however, so the expedient of having 
them take three of the four tests during *:he scheduled time and one of 

1 

The project proposal called for 40 to 45 subjects but an effort was 
made to increase this number to 60. With the aid of an extension to the 
project, fifty-eight student teachers were tested and/or observed to 
some degree of completeness, but due to illness, schedule conflicts, and 
other "end of the term" complications (student teachers had to be observed 
within a two week period at the end of their student teaching experience) 
complete data was obtained for only 39 of them. 



them at home was folloved. This was a workable solution since two of 
the tests, the MTAI and Word Test, were self -administered, paper and 
pencil measures* Operationally this meant that the Film Test and Simula- 
tion Test were always administered under supervised conditions, and either 
the Word Test or the MTAI were administered under non-supervised conditions, 
i.e«, at home. Furthermore, since it was critical that the Film and 
Simulation Tests be administered under supervised conditions, it also 
meant that these two tests were always administered in a one-two order, 
and that the MTAI or Word Test was always third in the order of 
presentation* While the Film and Simulation Tests were always assigned 
their order of presentation randomly, and the Word Test and the MTAI 
were always assigned to the non-supervised condition randomly, the 
inability to follow a totally random assignment of test order repre- 
sented a source of error in the data and a departure from the procedure 
followed in the parent study* 

Another departure from the parent study derives from the scoring 
procedures used with the Simulation Test. While the ^AI, Word and 
Film Tests require only the tabulation of the responses students 
make to them, the Simulation Test requires the coding or classification 
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of descriptively written responses (protocols) that are made to it* 

This requires that coders master a category-rating scale system (see 
pp* 114-118 in Schalock, Beaird and Simmons, 1964) and demonstrate 
their reliability in applying it* 

Generally speaking, evidence as to the accuracy with which coders 
could apply the simulation coding system was disappointing. As in 
the parent study, project staff worked in two-man teams in making the 



catcgovy And riting acaIa AAsignments * EAch ncabAr of a Caabi indo* 
pAndAntly tAAd aacH rAsponsA And IndApAndAntly AAolgnod a Acore for 
each »ca1a Zo that reoponsA, but» AftAr comparing codings arrived at 
A Joint decision as to the "correct" ratings or category placement when 
there was disagreement* 

Evidence as to the reliability with which teams assigned their 
codings was obtained by having each team score eight protocols and 
then compare their codings* The results of this comparison appear 
in Table 8*^ As in the parent study, it was decided arbitrarily to 
identify as inter-^team disagreement any variance of three or more 
frequencies in either the numerator or the denominator of each score* 
Using this base, each measure was then checked to see the number of 
disagreements between teams that appeared across the eight protocols* 
The cells that are enclosed with heavy lines in Table 8 are the scores 
on which the coder teams were judged unreliable by applying this 
criterion* 

It will be seen from these data that the coding teams were unable 
to agree upon categorization or scale placement much more than 60 per 
cent of the time. In most studies this level of agreement would be 
judged inadequate and further training of coders or refinement of the 
category*~rating scale system used in the coding would be indicated* 

^It will be noted that 13 category and scale scores appear in 
Table 8 while only 7 predictor measures come from the Test as a whole* 
This apparent discrepancy is accounted for by the combination of some 
of the 15 scales into single predictor measures* The factor analytic 
data upon which these combinations rest are reported in the parent 
study (see pp* 117*119)* 
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lined squares represent cells where observer teams were In sufficient disagreement as to be Judged 



Thi. was not a feasible solution in the present study for two reasons! 

1) additional training in the system did not appreciably affect relia- 
bility scores (the category definitions and scoring rules described 
for thd measure in the final report of the parent project were suffi- 
ciently inadequate to make the demonstration of reliability Impossible, 
no matter how Intensive the training) and 2) the system could not be 
refined or altered as that would lead to a predictor measure that was 
different from that used in the parent study. As a consequence, the 
level of reliability demonstrated in Table 8 had to be deemed acceptable 
even though the data that thereby derived from the simulation measure 
were of a highly unreliable quality. It is Interesting to note, however 
that even with this degree of unreliability in the data the measures 
that derived from the data were relatively independent (eee Table 9), 



Table 9. Intercorrelation Matrix for Subscales of the Simulation Test 



Scales 1 ^ 


3 


A 


5 


6 


7 


1 Management 1*00 


-.63 


.48 


-.19 


-.12 


.22 


2 Interpersonal 


.43 


-.52 


.13 


-.22 


-.70 


^ Awareness 














1 00 


-.40 


.13 


-.006 


-.45 


3 Structure 


JL a Ww 










. Address to 




1.00 


-.18 


.12 


.33 


Individuals 


















1.00 


.01 


-.05 


3 Use of Questions 




















1.00 


.26 


6 Trust 












7 Academic Orientation 










1. 00 
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Crittrion aetsurea » At the time that the parent study vas under 



taken (1962 - 1964) none of the classroom interaction aeasures that then 
existed were particularly appropriate to the purposes of the study. The 
focus of the predictor measures was upon discipline or class rooa aansge^ 
ment behavior, and with the exception of the work of Hughes (1959), and 
to some extent that of Medley and Mitxel (1958) , existing measures did 
not take that dimension of teaching behavior into account. As a conse- 
quence an effort was made to develop a system for describing teacher- 
learner interaction that focused upon both classroom management and 
instructional behavior. The decision to undertake such an effort 
stemmed from a history of experience in the application of observational 
methods to the study of parent-child interaction (Moustakas, Sigel and 
Schalock, 1956) (Schalock and O’Neill, 1960) and a deep dissatisfaction 
with the superficiality of measures of teaching behavior being proposed 
at that time by Hughes (1959), Flanders (1960), Smith (1960), and 
Medley and Mitzel (1958) . 

As anyone who has attempted to develop an observational system 
knows, it is a time consuming and difficult task. As a consequence, 
while it was possible to develop a system of observation that provided 
the kind of data needed in the parent study, the system itself was 
little -vore than a first approximation to the system ultimately desired. 
This became clearer and clearer as the present study progressed, and as 
a result the decision was made to extend the system within the context 
of the present study to a more finished state. It was partially toward 
this end that a six month extension to the study was obtained. 



28 



the full range of factors which Influence it. An overview of the system 
is presented in the accompanying monograph (see Attachment 1). Detailed 
category definitions, examples, and operational procedures appear in a 
training manual that is now being completed (Schaloclc and Mlcek., 1968). 

The use of an expanded observation system in the replication study 
represented a potential problem: how does one use a "different" measure 

ment system and still obtain essentially the "same" measures? A further 
complication stemmed from the decision to eliminate from the study the 
four criterion measures used in the parent study that depended upon 
rating scale data (measures 12 through 13 in Table 6, p. 18). As 
indicated previously, the intercorrelations between these measures 
were sufficiently high as to make them unacceptable as independent 
measures. With these changes, the decision was finally reached to 
include eight measures that were representative of the category based 
measures used in the parent study and three new measures made possible 
by the expanded observation system. The total set of criterion meas** 
ures used in the replication study are listed in Table 10. The 
reliability of observers in applying these measures, as this is 
reflected in the comparability of criterion measures obtained by 
individual observers observing simultaneously but Independently, Is 
presented In Tables 11 and 12. While these data were not as supportive 
of observer reliability as had been desired, the press of the project 
schedule demanded that they be accepted so that field observations could 
be undertaken. Fortunately, even though some of the measures appeared 
to be relatively unreliable prior to formal observation, they proved to 
be relatively Independent as formal measures. The Intercorrelation data 
for the criterion measures, as these were derived from the final data 
pool on both student and experienced teachers is presented in Table 13* 
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Table 10. Crltarton Maaauras Utad In tba SapUcation Study 



Mtaaura* 



Sourca** 



1. Dagraa of Control 



All taaching uovta which raflact canaorahi^. 
All taaching uovaa 



2. Orlantation to tha Uaa of 
Powar in Haintaining Control 



*3. Teach ar Rasponae to Daviant 
Behavior 



*4. Orientation to the Uaa of 
Poaitiva Rainforcanant 

3. Conaideration in Raaponaa 
to Acadanic Initiationa 



6. Conaideration in Raaponaa 
to Non-Acadeaic Initiationa 



*7. Conaideration in Raaponaa 
to All Mon-Acadaaic 
Behavior 

8. Affective Orientation 



9. Teacher Approach ablanaaa 



10. Individual va. Group Tocua 



11. Uaa of Inquiry in Inatruction 



All cenaorahip aovaa which rely upon povar 
aa a baa is for behavioral change 
All censorship Moves 

All non-cenaorlng responses to daviant 

behavior 

All raaponaea to deviant behavior 

All instances of positive evaluation 

All evaluative noves 

All non-censoring raaponaea to acadanic 

initiationa (Flow 111) 

All raaponaea to acadanic initiationa 

All non-censoring raaponaea to non-acadenic 

initiationa (Flow 111) 

All responses to non-acadenic initiationa 

All non-canaoring raaponaea to non-acadanic 

behavior (Flowll) — 

All raaponaea to non-acadenic behavior 

All inatancaa of poaitiva a ffect (♦) 

All instances of affect ) 

All instances of student initiationa (Flou l lI) 
All instances of student and teadiar 
initiations 

All instances of interaction uith a single 

child 

All inatancaa of interaction 

All instances of Inquiry in relation to 

acadanic natters 

All teacher acta in relation to acadanic 

natters 



*Tha neaaures that are new to tho replication study are starred. 

.p.cmc eat.gerlM of b.h«vlor Mklog ^ ““"2! 'S, 

b. fenad in th. Bonoiraph that ptooldoa m muxftm of th. taaeh ng 
Syatn of Oba.rvatlon (..a Att.cb»nt X) and th. Training tUnnal that 
acconpanias tha ays ten (Schalock and Micek, 1968). 
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TAl. 11 S*li«bllity of Oboorvoro In Applying tho TE Oboorrntlon Synton In 
T*l. 11. tel^Wlty^ot ^ R,fl,ce.d in th# Conp.r*ility of Crlt.rion 

Moosuim thot Dorivc fron Sinultonoouo but Indopondtnt Oboorvationo . 



Crlttr- 

iOQ 

Measure 



Observers* 
ABC 



Observers 
D A F 



Observers 

ABF 



Observers 
C E F 



1 


.03 


.06 


.02 


.07 


.09 


.05 


2 


.85 


.65 


.72 


.55 


.67 


.73 


3 


.00 


.00 


.00 


.00 


.00 


.00 


4 


.34 


.45 


.53 


.32 


.34 


.37 


5 


.40 


.32 


.47 


.22 


.43 


.61 


6 


.00 


.25 


.00 


.33 


.33 


1.00 


7 


.25 


.25 


.50 


.00 


.33 


.33 


8 


.00 


.50 


.75 


1.00 


1.00 


1.00 


9 


.12 


.16 


.14 


.07 


.08 


.10 


10 


.22 


.27 


.23 


.44 


.45 


.52 


11 


.45 


.36 


.54 


.30 


.29 


.33 



.05 


.04 


.04 


.08 


.04 


.06 


1.00 


.85 


.92 


.75 


.56 


.56 


.00 


.00 


.33 


.00 


.00 


.00 


.36 


.44 


.33 


.40 


.63 


.67 


.93 


.76 


.72 


.92 


.93 


.81 


.00 


.00 


.33 


.00 


.00 


.00 


.00 


.00 


.00 


.50 


.25 


.75 


1.00 


1.00 


.50 


.00 


.00 




.11 


.15 


.17 


.54 


.43 


e 56 


.45 


.31 


,36 


.21 


.25 


.18 


.25 


.32 


.12 


.26 


.22 


.28 



Criter- 


Observers 


Observers 


ion 


Measure 


C D E 


A B D 



Observers 

BCE 



Observers 
D E F 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 



.09 


.15 


.11 


.69 


.87 


.52 


.00 


.00 


.00 


.62 


.27 


.54 


.81 


.91 


.69 


1.00 


.50 


.00 


.00 


.00 


.00 


.50 


.25 


.50 


.30 


.35 


.34 


.10 


.08 


.06 


.06 


.08 


.07 



.08 


.11 


.06 


1.00 


.53 


.63 


.00 


.00 


.00 


.15 


.17 


.23 


.75 


.78 


.76 


.00 


.00 


.00 


.50 


.00 


.25 


.00 


.00 


.00 


.36 


.38 


.45 


.42 


.31 


.36 


.31 


.32 


.46 



.04 


.02 


.03 


.73 


.81 


.76 


.00 


.00 


.00 


.27 


.33 


.26 


.78 


.76 


.92 


.00 


.00 


1.00 


.00 


.00 


.00 


1.00 


.25 


.25 


.28 


.19 


.17 


.11 


.08 


.13 


.19 


.24 


.17 



.07 


.11 


.08 


.82 


.96 


.55 


.00 


.00 


.00 


.33 


.35 


.35 


.43 


.68 


.72 


.00 


.00 


.00 


.25 


.33 


.25 


1.00 


1.00 


1.00 


.41 


.43 


.40 


.38 


.32 


.37 


.25 


.27 


.32 



STh* srudv rcaulred six Independent observers to be in the field during the tine of 
To daaSMtroM talUbility with the obaarvation ayatan they 

f^r U«a”lth wo other obaerVara. Three aeparate teaehera *ere 

“fa f.“h7*i:w.a:::. sicfreiiabiiity <*.«;;.ti.- f 

gories on which observers were especislly unreliable sre underlined. 
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Tiblt 12. Pliability of Obsarvara in Applying tha Tt Obaatvation Syatan in 

Movaii>ar, 1966 » aa thia ia Pflactad in tha Comparability of Critaripn 
Maaauraa that Dari^ fron Slsultanaoua but InPpanPnt Obaanrationa 



Critar- 


Obaarvara 


Obaarvara 


Obaarvara 


Obaarvara 


ion 


Maaaura 


ABC 


D A F 


ABF 


C E 



1 


.07 


.07 


.01 


.09 


.10 


.08 


.05 


.04 


.04 


.04 


.02 


.05 


2 


.92 


.73 


1.00 


.80 


.56 


.58 


1.00 


.67 


.25 


.80 


1.00 


.83 


3 


.00 


.00 


.00 


.00 


.33 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


4 


.37 


.46 


.63 


.32 


.34 


.50 


.32 


.44 


.55 


.25 


•67 


•61 


5 


.00 


.67 


.86 


o29 


.56 


.40 


1.00 


.78 


.60 


.92 


.96 


• 76 


6 


.67 


1.00 


.00 


.00 


.64 


.50 


.00 


.00 


.00 


.00 


1.00 


•50 


7 


.25 


.00 


.25 


.33 


.14 


.00 


.33 


.00 


.00 


.50 


•00 


.00 


8 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


.00 


1.00 


.00 


9 


.07 


.13 


.12 


.11 


.15 


.17 


.06 


.19 


.08 


.30 


• 34 


.35 


10 


.26 


.26 


.22 


.21 


.26 


.19 


.36 


.33 


.45 


.44 


•46 


.48 


11 


.30 


.27 


.30 


.45 


.31 


.55 


.25 


.32 


.11 


.22 


• 34 


.34 



Critar- Obaarvara Obaanrara Obaarvara Obaarvara 



ion 



Maaaura 


C 


D 


E 


A 


B 


1 


.03 


.06 


.05 


.09 


.18 


2 


.67 


.83 


.50 


1.00 


.67 


3 


.00 


.00 


.00 


.00 


.00 


4 


.80 


.14 


.20 


.12 


.10 


5 


.85 


.90 


.60 


.75 


.72 


6 


.00 


.00 


1.00 


.00 


1.00 


7 


.00 


.00 


.00 


.50 


.00 


8 


1.00 


1.00 


.00 


.00 


.00 


9 


.28 


.19 


.22 


.54 


.43 


10 


.38 


.43 


.46 


.08 


.06 


11 


.31 


.31 


.24 


.25 


.33 



D 


B 


C 


E 


D 


E 


F 


.10 


.08 


.11 


.05 


.08 


.03 


.07 


.75 


.70 


.83 


.80 


.71 


1.00 


.50 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.27 


.29 


.10 


.29 


.75 


.34 


.36 


.96 


.75 


,76 


.88 


.94 


.93 


.97 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


.00 


1.00 


1.00 


1.00 


1.00 


.33 


1.00 


.51 


.30 


.26 


.34 


.42 


.48 


.58 


.06 


.45 


.39 


.40 


.11 


.08 


.09 


.19 


.06 


.06 


.06 


.19 


.16 


.14 






Tabic 13. Intcrcorrclatlona for the Cricerloa Heacttrec 
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ClasstooB obicrvatlons » The sane proccdi 



followed in Bak- 



ing claasrooB observations as were followed in the parent study. Each 
subject was observed for nine 30-«inute periods during the final two weeks 
of her student-teaching experience. Observations were made on days 
approximately one week apart by three different observers. Three 30- 
minute observations were made each day; two in the isorning and one In 
the afternoon. During each observation periodic subjects had primary 
teaching responsibilities in their rooms. Morning observation periods 
were characterized by relatively structured activities involving students 
in group settings. Afternoon periods* on the other hand* were generally 
characterized by unstructured individual activities. Such a schedule 
was devised to obtain a ratio of observations of teacher behavior in 
structured and unstructured activities roughly equivalent to that found 
in regular school activities. 

Prior to actual observation* participating school personnel and 
college supervisors were oriented to the project and procedures to be 
employed. In addition, a practice observation was made in each subject’s 
room one week prior to actual observations. After the practice observa- 
tion* the supervising teacher* subject, and students were permitted to 
ask questions and express concerns regarding the observation procedure. 

When observers arrived to record actual observations, they spent 
ten or fifteen minutes becoming familiar with the nature of interaction 
in the classroom, the setting* the traffic patterns* etc. This was* in 
a sense, an "acclimatization" period for observers. Once observation 
began* it continued for 30 minutes uninterrupted. While observing* 
observers were seated in unobtrusive positions that enabled them to a* 
the subject and hear all that she said to students. There was no inter- 
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•ction between obeerver end students or between obtetver end subject. 

When the 30-ninute period of obserwetion was coeplete, the observer 
quietly left the room, returning when the next observetlon period wee 

scheduled e 

Extension #1: The Addition of Sltuetlonel Pete t^ Orlglnel 

PredlctlOsi Scheme 

As Indiceted previously, the prediction scheme In the patent study 
included only test scores: situational factors affecting the behavior 

being predicted were not taken into account. Factors such as unplanned 
events, composition of the cUss, physical conditions within the class- 
room and the nature of the activity In which teacher and learners engaged 
were not controlled. Since these are likely to be significant deter- 
minants of teaching behavior, the present study attempted to Include 
then in it. The aim of the present effort was to obtain prototypic 
measures of such factors and include them in the prediction scheme as 
control variables to see if their Inclusion would significantly increase 
the amount of variance accounted for In the criterion measures. 

Toward this end. seven dimensions of the classroom setting were 
identified; (1) the subject matter and the activity being pursued, 

(2) the organization of the classroom, for example, small study groups, 
individuals around a large work table, individuals at their desks. (3) the 
number of learners in the classroom, (4) the general characteristics of 
the learners in the classroom, for example, their personality character- 
istics. their capabilities, age. and sex. (5) the physical characteristics 
of the classroom, for example, the space available pet learner, the 
■ presence of individual desks or tables, heat, ventilation, lighting, the 
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proximity to activity on the playground or in the halls, (6) the philos» 
ophy of the school administration, particularly the building principal, 
in relation to classroom activity, and (7) unplanned events which are 
disruptive to planned learning experiences, for example, a fire drill, 
an unanticipated visitor, a child becoming ill, building repair or work-* 
men's activity nearby. Measures for all of these factors were developed. 

Two of them, the subject matter and activity in which the class is involved, 
and the organization of the classroom, are described in connection with 
and at the same time that teacher and learner behavior are described; 
that is, they are part of the observation system (sre Figure 2). 



SUBJECT OBSERVATION 12 3 

OBSERVER PAGE 

DATE 

1 ^ 

(Activities I 

I I 

Classroom | and | 

f 

Structure ^ Topics jprogressive Record of Teacher-Learner Interaction 

I 

I 



) 

I 

I 

I 

f 

I 

I 

i 



I 



i 

I 



I 



Figure 2. The form on which the categories descriptive of teacher- 
learner interaction are recorded. 
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A diary record of the unusual or unplanned events that occur during the 
day on which the observations are made is kept by the teacher. All of 
the other setting measures, that ia, the number of children in the class 
and their characteristics, the physical characteristics of the classroom, 
and the philosophy of the school administration in relation to the 
activities that take place in the classroom, are obtained through inter- 
view, either prior to or subsequent to the observation. In the paragraphs 
which follow, each of the situational measures are described briefly. 

Subject matter , activity , and classroom organisation . The subject 
matter in which a class is involved, the activity being pursued within 
that subject matter, and the classroom organization that accompanies it 
are recorded at the same tiuie and on the same recording sheet as is the 
teacher-learner interaction (see Figure 2). Each observation begins with 
a notation as to subject natter, activity, and classroom organization, 

-and these notations continue opposite the recording of the interaction 
that is occurring throughout the observation period. Tine also is noted 
so that it becomes possible to identify the length of time spent within 
any given activity, classroom organization, etc. By Including tine, 
activity, classroom organization and subject matter in the observation 
record it is possible to analyze teacher-learner interaction against any 
or all of these factors. 

Number and characteristics of children im a classroom, the physi cal 
ch«racterlstlcs of • classroom , and th. philosophy of th* school admiS- 
latratlon toward conduct In the classroom . As Indicated above, information 
on these variables Is obtained through an interview with the teacher. The 
specific items in the interview schedule are listed in Figure 3. The items 
included in the schedule were identified by elementary school teachers as 
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Fifurc 3* Th« ittt«nrlcw schedule used id obteialag e descriptioii of 
the situstioasl factors sffcctlag the ■aaageasat behavior 
of teachers. 



TEACHBt 



GSADl LEVEL 
DATE 



I CLASSROOM RELATED FACTORS 

A. Physical Features of the Classrooa 

1. Size of rooB ia relatioa to size of class. 

a) square footege 

b) teacher's feeliags about adequacy of space 

2. Seatiag arraagUBsats ia the rooa» i.e.t tables aad chairs 
vs. desks • etc. (describe) 

3. Facilities for toilet aad driakiag (if preseat, describe) 

4. Susceptibility of room to noise aad studeat traffic. 
(Teacher's estiaate; if susceptible, have teacher 
describe the nature and/or amount.) 

5. Availability of educational aaterials, teaching aids, etc. 
in the room (teacher's estiaate of adequacy). 

B. Characteristics of the Class 

1. Huaber of students in the class, plus the nuaber absent 
on day of observation. 

2. Boy-girl ratio. 

3. Muiber of exceptional children in the class, e.g. , 
intellectually, physically, end emotionally handicapped, 
intellectually superior, etc. (List nuaber by class of 
exceptionality . ) 

4. The nuaber of children vho are habitually disruptive of 
the class plus number absent on days of observation 
(obtain from teacher's ^records). 
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Figure Coatinued 



5. Principal's sstinstioa of the socio-econonic status of 
the fsaillse of the students In the school (provlds oss 
of three estiaetes: predoniaently lower SEC» predoai* 

nently aiddle end/or upper middle SEC; fairly even cross-* 
cutting of the lower and middle SEC). 



6. Principal's esriaatc of th^ mobility of the student's 
families (provide one of three estimates: a high pro- 

portion mobile, e.g., service or migrant worker families 
a high proportion permanent residents; a fairly even 
distribution of mobile and permanent residents). 



II SYSTEM RELATED FACTORS 

A. Official Policy Toward Classroom Discipline and Control 

1. Policy toward noise in the classroom (describe; obtain 
through principal) . 

2. Policy toward the h.ndling of "discipline problems" by 
teachers (describe; obtain through principal). 

B. Classroom organization, e.g., self contained, cooperative or 
nongraded, team teaching, etc. (describe; obtain throu^ 
principal) . 

C. Curricular innovations, e.g.. the "new suith," experimental 
biology courses, etc. (describe; obtain through principal). 



AO 



o 



me 



£.ctot. »hlch frequently end elgnificntly Influence that uhlch occurs 
within their clsssroous. Since the titles of the factors are aelf- 
axpUnetory, no further comment will be made about them. The interview 
1, usuelly mlmlnistered after the observation has been completed so as 
CO obtain information on the number of children absent during the observa- 
tion, but it may be administered before the observation if so desired. 

Also, the interview achedule, in the form of a questionnaire, may be 

given to the teacher to complete by herself. 

iin.nHrinated events . One of the setting factors identified by 
teachers which often influences teacher-learner interaction is that of 
unanticipated events. These can range from a sudden snow storm or an 
unanticipated assembly to a child becoming ill or a stray dog finding 
his way into the room. By definition, an unusual event is one which 
interferes with that which is planned in relation to instruction. In 
order to obtain information as to the nature and occurrence of these 
events each teacher that is observed is asked to record at the end of 
the observation period any unanticipated events which occurred either 
prior to or during the time of observation that in her opinion had a 
significant influence upon that which occurred during the course of the 
observation. The recording form that is provided the teacher for this 

purpose appv^ers as Figure A. 

P«di^ measures d5Ti^ 

Four global measures designed to reflect the complicating effects of 
setting factors upon the task of classroom management were derived from 
the descriptions of setting variables provided by the interview schedules 

outlined in Figures 3 srsd 4. 
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TIACBER 



GRADE LEVEL 

OBSERVATIOM DAT (cireU day) 12 3 

DATE 



It !• w«ll known by tnachnrt that fnctors »uch ns ths tnapsrn- 
turs or vnntilnclon of n clnssrooa, thn physical nsll-bslng of 
ehildmn, ths anticipation of a special avsnt or holiday, ths 
^ppc^i^dica of an insitsd or uninvited aniaal, the occurrence of 
• fixe or a co.istructioa project nearby, or the well-being of the 
teacher herself can have a narked effect upon behavior occurring 
within the classroon. Since our research requires as natural a 
picture as possible of classroon behavior, would you please 
below any circunstances that you feel nay have caused the behavior 
observed in your classroon to be different fron that which usually 

occurs. 

If unusual events did occur, would you indicate also the 
approxinate tine that they occurred. 

The ezanples of unusual events cited above are, of course, 
only suggestive of the wide range of events which can affect a 
^Iggfxoon. When you are thinking about that whi^ nay have affected 
behavior in your own classroon please feel free to include anything 
and everything that nay have nade it an "unusual” situation. 

The observer will pick this record up fron you at the close of 
the last observation period on each observation day. 



Figure A. The fom for recording unusual events which affected or 
could have affected behavior in the cUssreon during 
the tine of observation. 
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1. A descriptor of the physical setting. Iteas lA, 2, 3, *, j 

3 and 11 B and C from the interview schedule outlined in Figure 3 were | 

combined into a global. 3-point scale designed to reflect the teacher’s 
feelings or Judgment about the adequacy of the physical features of the 
setting in which she taught. A score of zero on the scale indicated 
that, all factors considered, the physical characteristics of the class- 
room seemed to be somewhat handicapping from the point of view of 
classroom management: a score of 1 indicated that they were neither 
particularly handicapping nor particularly facilitating: and a score 
of 2 indicated that they were facilitory of the management task. 

2. A descriptor of the sdmlnlstrstlve setting. Item II A 1 from 
the interview schedule outlined In Figure 3 provided the descriptive 
data from which this measure was derived. The measure was scored in 
the same way as measure 1, namely, a score of zero indicated that the 
administrative setting handicapped the task of classroom mansgement. 
a score of 1 Indicated that it was neither particularly handicapping or 
facilitating, and a score of 2 indicated that it was facilitating. 

3. A descriptor of the characteristics of the class. Items I B 1, 

2, 4, 5 and 6 from the interview schedule outlined in Figure 3 provided 
the descriptive data from which this measure was derived. In contrast 
to measures 1 and 2. measure 3 represented an algebraic summation of 
each of the five factors that fed into the measure. Before summation 
each of the five factors was scored from 0 to 2, following the same 
rationale as was used in scoring measures 1 and 2. This procedure 
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pemltted acaiure 3 to have a range of 0 to 10. The criteria folloved 
in arriving at the verioue aubacorea were: 

e) fever than 18 atudenta in a class yielded a score of 2 
and nore than 32 yielded a score of sero» 

b) a ratio of girls to boys in the class that favored the 
girls, i.e«, greeter than a 1:1 ratio, yielded a score 
f^f 2 and a ratio of boys to girls that exceeded 2:1 
yielded a score of zero; 

c) a class in which no children were habitually disruptive 
received a score of 2, whereas a class \^ich had 3 or 
sore children in it who were habitually disruptive 
received a score of zero; 

d) a class which was awde up of children from predominantly 
middle and upper class families received a score of 2 
whereas a class which was made up of children predomin*- 
antly from either lover-lower or upper class children 
received a score of zero; and 

c) a class which was made up of children from families 
which were predominantly permanent in the community 
received a score of 2 whereas a class which was made 
up of children from families which were predominantly 
mobile received a score of zero. 

4. A descriptor of unusual events. The interview schedule outlined 
In Figure 4 provided the descriptive information from which this measure 
was derived. Like measure 1, the information obtained from the inter- 
view was forced into a single three-point scale describing the exten- 
siveness and/or criticalness of the unusual events that occurred during 
a day that classroom observations were made. If no unusual events 
occurred, a value of zero was assigned; if three or more unusual events 
occurred, or if a single event was extremely disruptive, a value of 2 
was assigned. 

A fifth measure descriptive of the classroom setting was also 
used in the study, namely, a mearure describing the instances of 
behavior that were disruptive to the class during the classroom obser- 
vations. This measure was derived from observational data rather than 
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Interview dew end conelated elnply of the number of such inetancce 
that occurred. Four acorea actually derived from thia maaaura: 1) a 

acore repreaentlng the total number of auch Inatancea, 2) a acora 
repraaentlng the number of Incldenta that were non-academic in nature 
and directed toward the teacher, 3) a acore repreaentlng the number 
of incidents of the aaae kind directed to other children, and *) the 
number of inatancea that had an academic focu. but which were auffl- 
clently inappropriate in nature to cauae them to be diaruptive. 

By combining the four mcaaurea derived from the obaervation data 
and the four derived from the interview data, a total of eight aetting 
or aituatlonal meaaurea were available for use in the atudy aa predic- 
tors. The intercorrelationa for these measures appear in Table W. 

Extension #2: The Repetition of the Study , Includin g Use o£ 

Situational Measures , with Experienced Prlnar jr Grade Teacher g 

With one exception the tame neaaurea and procedures aa outlined 
in the replication and extension of the parent atudy with student 
teachers were followed in the extension of the parent study to experi- 
enced teachers. The one exception occurred in relation to the time 
periods in which testa could be administered and observations could be 
made. In contrast to the rather rigid schedule of testing at the end 
of the term prior to student teaching and observation within the last 
two weeks of the student teaching experience, the experienced teachers 

could be tested end observed st sny time. 

Subjects in the study were thirty-nine experienced primary grade 

teachers drawn from the school districts in wl.ich the student teachers 
who participated in the study did their student teaching, and in the 
tame proportion. Only those who volunteered for the project were 
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included lu it. SO ro.trlction. «er. pUced upon length oi teething 
experience beyond having ; t.ught for et le..t one y.er prior to tehing 
p.rt in the study. The testing and observation, required by the atudy 
«re fitted to the convenience of the teachers within a given di.trict 
and to the tine schedule of project personnel. 

rnve.tieation of th, Sa ^ MS. 

Scales of Diff^ g^^Serion He,su^ 

AS indicated previously, the rationale underlying the inclusion 
of an investigation of this Kind in the present study rests upon the 
£.ct that the study depends upon behavioral sanpling for its criterion 
aeasures. but as yet there is no conclusi e evidence as to the length 
or number or distribution of behavior samples needed in order to 
obtain stable or representative criterion measures. The problem 
derives from the fact that teacher behavior is situation bound, that is. 
that on any given day or on different occasions within a d.y situational 
influences can be expected to bring about a great deal of variation in 
observed behavior. This problem is not unliKe other sampling problem, 
encountered within the behavioral sciences, and it is generally assumed 
Chat Situational influences can be balanced out when the sample of 
observed behavior is lengthened. The question still to be answered. 

however, is "Uhat are the fewest number of observations required to 

reflective of a balance of situational 
obtain a performance measure reflect e 

influences?" 
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Th. pun of th. pr...nt inve.tlg.tlon r.Utlvel, .1-pU: 
conpnre the ».gnltude of the cotteUtlon. derived fro. the prediction 
.cheee with criterion «..«re. b..ed upon i hour. 2 hour. »d 3 hour. 

Of Obiervttioa ti««, retpectivoly. Thi* be done for both etude 

.„a .xp..ri.nced te.cher.. While it ve. recognixed th.t .uch . deeign 

,.. f.r .Upl. to ...... th. of hrt.vlor.1 MoplU. i« •»T 

finel .en.e, it v.. felt th.t it «.. .ufficient to provide inforoetion 
th.t would be of u.e in the preeent .tudy «d in the de.ign of future 
.tudie. on the i..ue. Criterion .e«ure. in the p.rent otud, were 

beeed upon two hours of observation. 

- « aai BsaaEiaga a aa ssant aatuM 

and S8 Vslusble m Possible 

re«e.rch, for often they f.11 to aaximlte the return, th.t cooe fro. 
their re.e.rch for the.e institution, end the personnel within the., 
in .any c.e. thi. he. led to hesitancy or resistance on the pert 
of school personnel to get involved in educational research, or even 
to the closing of entire school districts to researchers. Because 
of the heavy de.ands that the present project .ade upon participants, 
special attention was directed to .ahing participation in it .axUally 
beneficial. Toward this end . two-pronged procedure was worhed out: 

in th. study, and b) give « -«ch infor.ation .. possible about th. 
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■tudy to thooe who porticipaMd in it.^ Three types of infotastlon 
were provided: 1) s history of the research that Xed to the study. 

It* »lgniflc«nce, *nd the contribution which the present study could 
conceivably wke to educational practice (this was provided during 
••recruiting meetings" prior to participation) , 2) a visual suamary 
of each participant** behavior, in the form of profiles, for each 
of the - 3 ine half-hour periods they were observed in the study (this 
was provided in the form of a seminar during the term following 
participation), and 3) a written summary of results at the completion 
of the study. The addition of the seminar at which teachers could 
view and discuss detailed records of their own behavior proved to 
be highly satisfactory, though costly of time dnd energy, and is 
recommended as a worthwhile procedure to follow when information 
of this kind is available. A copy of a memorandum describing the 
procedure and a copy of a behavioral profile given to teacher* for 
discussion in the seminar appear as Attachments 2 and 3 respectively. 



^Special thanks are due to Dr. Jack Hall who helped work out 
the "Information Feedback" procedure that is described below and 
apply it within the context of the Elementary Teacher Education 
Program at Oregon State University. 
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Chapter IV 

RESULTS AND DISCUSSION 

The data have been ordered according to the four Issues investi- 
gated in the study: 

1) Can the results obtained by Schalock, Beaird and Simoons (1964) 
be replicated? 

2) Can the per cent of variance accounted for in teaching behav- 
ior by the prediction scheme us^d in the Schalockt Beaird and Siomons 
study be increased by including in the prediction equation measures of 
situational factors that affect teaching behavior? 

3) Do the results obtained in (1) and (2) above with student 
teachers vary when the methodology = is applied to experienced teachers? 
and 

4) Do the results obtained in (1) , (2) , and (3) above vary as 
the behavioral samples on which the criterion measures are based vary? 

According to this ordering three separate analyses would have had 
to have been run on each of the first three questions in order to answer 
question 4; that ist each question would have had to have been analyzed 
using criterion measures based on It 2t and 3 days of observation respec- 
tively. Operationally, this would have required 528 regression runs to 
be made, a cumbersome and costly procedure. In an effort to short-cut 
this process, and still obtain the essential information desired on the 
relationship between length of behavioral sample and stability of criter- 
ion measure, a straightforward analysis of the differences obtained in 
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criterion measures as a function of length of behavioral saaple was under** 
taken. The rationale underlying the analysis was one of economy* if no 
differences were found in measures as a function of length of behavioral 
sample then not only could the tripling of regression runs be avoided 
but a smaller amount of data (1 or 2 days' data vs. that of 3 days') be 
handled in preparing the needed regression runs. The criterion used 
in the analysis against which to compare differences was the 3 day 
behavioral sample. 

Since whatever regression runs to be made in the study depended 
upon the results of this analysis* it was undertaken first. 

An Analysis of the Relationship Ke tween Length of behavioral Sample 
and Stability of Criterion Measures 

It will be recalled that three different behavioral samples were 
obtained on subjects: a day 1 sample (2 one~half hour observations on 

a given teacher with a given class * two in the morning and one in the 
afternoon) a day 1 + day 2 sample (both on the same teachers with the 
same class)* and a day 1 + day 2 + day 3 sample (all on the same teacher 
with the same class). To determine the length of behavioral sample 
required to insure stability of criterion data* each Individual was 
assigned three scores for each criterion measure. The first score was 
determined by summarizing the observations made on the first day* the 
second score by summarizing the observations made on the first two days* 
and the third score by summarizing the observational data obtained 
during all three days. Using the latter score as a standard, the first 
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two scores were coapsred agslnst it to determine the feasibility of 
utilizing shorter behavioral samples. The rationale underlying this 
procedure was straight forwards if XZ were found that scores based on 
one or two days of observation varied significantly from the final 
score one would have to conclude that a one or two day observation was 
not sufficient to insure a stable measure of teacher behavior. Alsot 
if this were the case» the question of the length of behavioral sample 
required to obtain stability would remain unanswered. On the other 
hand, if it were found that either a one or two day sample of behavior 
provided essentially the same measures as did the three day standard 
then one would be justified in using either the one, two or three day 
sample in deriving criterion measures. 

The data that derived from the .-;nalysis appear in Table 15. It 
will be seen from these data that for both the student teacher and 
experienced teacher samples, scores based on a single day's observation 
varied significantly from scores based on three days of observation. 

This was not the case, however, for scores based *upon two days of 
observation. For both samples observed in the study no statistically 
significant differences were noted between scores based on two days 
of observation and those based on three days of observation. Thus, 
for purposes of the present study, it was concluded that utilization 
of criterion scores based on a single day*s observation was not 
warranted, but that the utilization of two days of observation, when 
each day's observation time is based upon three one-half hour observational 
settings, provides as adequate or stable a picture of teacher behavior 
as do three days of observation. 
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Tab It 15. Htsb Scorts for Crittrion Htaaurtt Obtaintd froB Obst^nrs- 
cional Ftriodt of Difftrtnt I/angtba. 



Crltarlon 


Student Taachara 


Experienced Teadiars 


1 Day 


2 Days 


3 Days 


1 Day 


2 Days 


3 Days 


a 


*067 


*067 












aV/V 


aVTOO** 


aV/ / 


cV/5 


2 


.547 


.555 


.551 


.468 


.384 


.423 


3 


.251 


.253 


.253 


.191* 


.210 


.221 


4 


.218* 


.223 


.228 


.252 


.234 


.216 


5 


.835 


.835 


^839 


.796 


.798 


.820 


6 


.745* 


.769 


.799 


.623* 


.649 


.674 


7 


.185* 


.251 


.262 


.316 


.325 


.359 


8 


.606* 


.753 


.703 


.624* 


.737 


.750 


10 


.334* 


.357 


.356 


.252* 


.277 


.284 


11 


.253 


.259 


.259 


.321 


.329 


.328 


12 


.404 


.407 


.408 


.366 


.372 


.349 



* Diff trance betvttn Day I r^d Day 3 tlgnlf leant at o05 Itvtl. 

No significant difftrtneat appeared between Day 2 and Day 3. 



On the b«»ie of theee data, two decisions vere made: 1) to calculate 

criterion measures on the basis of day 1 day 2 data (the same data 
base as used In the parent study), and 2 ) run only one set of regression 
analyses, instead of three, in replicating and extending the study. 

While these data provided a basis for firm decision making in 
the present study, and supported the use of the two day sample in 
the parent study, they sre not sufficient in and of themselves to 
answer the full range of questions, that need answering in relation 
to the issue of behavior sampling. They do indicate that a 2 or 
3 day sample is different from a single day, but how would a 2 day 
sample compare to a five or ten day sample? More importantly, what 
difference would it make if the basis for behavioral sampling were 
activities or subject matter topics or stages in the development 
of topics? These and other questions ultimately must be answered 
if the study of behavior in situation is to be undertaken aeriously. 

The results of the present study represent a start in this direction, 
but a great deal more needs to be done. 
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A Conparlson of the Reeulf Obtained In the Preeent . Study with the 
Results Obtained In the Schalock , Bealrd and Slcanona Study 



Using the day 1 + day 2 observation sample as a basis for the 
calculation of criterion measures, analyses were run which essentially 



replicated the Schalock, Bealrd and Simmons study. These data, and 
the data from the parent study, are presented In Table 16. It will 



be seen from these data that essentially the same results were obtained 

J 

I 



Table 16. Percentage of Variance Accounted for In Student Teacher 
Behavior In the Parent and Replication Studies* 



Criter- 

ion 

Measure 


u 

o 

1 


Test 


Film Test 


Simulation Test 


Composite of 
’’Best” Predic- 
tors 


1st 

Study 


2nd 

Study 


1st 

Study 


2nd 

Study 


let 

Study 


2nd 

Study 


1st 

Study 


2nd 

Study 


1 


.303 


.570 


.281 


.784 


.303 


.133 


.563 


.556 


2 


.384 


.379 


.476 


.212 


.360 


.332 


.504 


.378 


*3 




.481 




.194 




.359 




.635 


*4 




.287 




.256 




.112 




.309 


5 


.292 


.466 


.384 


.438 


.292 


.188 


.504 


.419 


6 


.221 


.344 


.384 


.412 


.533 


.301 


.304 


.675 


*7 




.206 




.328 




.375 




.526 


8 


.348 


.264 


.292 


.353 


.384 


.244 


.593 


.545 


9 


.270 


.284 


.449 


.203 


.384 


.023 


.608 


.365 


10 


.221 


.419 


.410 


.310 


.397 


.164 


.757 


.480 


11 


.303 


.436 


.360 


.340 


.449 


.162 


.723 


.539 



*Crlterlon measures that were new to the replication study. 



In the two studies, though the Word Test In the second study tended 
to yield higher correlations than it did in the first and the 
Simulation Tests tended to yield lower correlations. 
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It will b< recalled that In the parent atudy the Slaalatlon Teat 
iraa cone latently the eoat powerful predictor of the three; io the 
reytlicatlon atudy it waa coaaiatently the least powerful# Aa would 
be expected » becauae of the decreaaed effectiveneaa of the Sieolation 
Test aa a predictor* the coepoaice eeaaure alao decreaaed in its 
predictive cf f ectivcncaa • 

These data are at one and the saee tiM encouraging and diaappoint- 
isg* On the encouraging aide is the fact that the Word and File Teata 
Maintained theeaelvea aa fairly adequate prcdictora of behavior in 
aituation. On the discouraging aide ia the fact that the Sinulati^ 

Teat failed to replicate in its effectiveness. This ia discouraging 
not only froe the point of view of losing a potentially powerful Meas- 
uring device* but froe the point of view of its iwplicatlona for test 
theory generally. It will be recalled that the hypothesis tested in 
the parent study wss that aa teats becaoe eora lifelike in their stieu*- 
lus and response properties tffectiveness of prediction would increase. 
In general the hypothesis was supported by the study. The new data 
indicate that thia may not be lo, especially in light of the strong 
showing of the Word Test. Whatever the long range conclusion regarding 
the hypothesis will be, it is clear that at this point in tine it does 
not have unequivocal support. 

While recognizing thia, it also needs to be recognised that several 
potential sources of error entered the Sieulstion data in tha replica- 
tion atudy (see pp. 24 - 27 ) and it could be that the results are sieply 
reflective of that error. The results with tha Word and File Tests 



would seem to support such an interpretation, for they do essentially 
replicate* Assuming this to be a j^enuine possibility, and recognizing 
that the hypothesis tested in the parent study is not only attractive 
logically but has in fact once been supported, the better part of 
wisdom would seem to be to maintain the hypothesis and set up a series 
of studies to test it more fully* 

Ah Analysis of the Effects of Adding to the Prediction Scheme Descriptors 
0^ .^.he Setting in Which Criterion Measures Were Obtained 

The data reflecting the consequences of adding to the prediction 
scheme measures descriptive of the setting within which teacher behavior 
occurred are presented in Table 17* In making these calculations all 
eight setting measures (see pp* 41-45) were used as predictors* In the 
prediction runs involving the Word, Film and Simulation Tests individu- 
ally, all of the setting measures were included; in the prediction run 
involving the composite of "best” predictors only those setting measures 
that were in fact "best" predictors, in previous runs were included* 



Table 17* Percentage of Variance Accounted for in Student Teacher 
Behavior by Tests and Situational Descriptors 



— 


Criter- 

ion 

Measure 


Word 

Test 


Word T 
& Sit- 
uation 
Meas ' s 


Film 

Test 


Film T 
& Sit- 
uation 
Meas ' s 


Simu- 

lation 

Test 


Sim * T 
& Sit- 
uation 
Mess's 


Compo- 
site of 
Tests 


Compos . 
& Sit- 
uation 
Meas ' s 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 



570 


*644 


.784 


.865 


379 


*534 


*212 


.318 


481 


*576 


*194 


.449 


287 


*581 


*256 


.561 


466 


*543 


.438 


.517 


344 


.663 


.412 


.655 


206 


*489 


*328 


.565 


264 


*772 


.353 


.675 


284 


.693 


.203 


.585 


419 


*684 


.310 


.619 


436 


*685 


.340 


.553 



133 


*252 


*556 


.551 


332 


*438 


*378 


.584 


359 


*518 


.655 


.587 


112 


*429 


.309 


.397 


188 


*335 


.419 


.475 


301 


.494 


.675 


.571 


375 


.565 


.526 


.568 


244 


.626 


.545 


.472 


023 


.653 


.365 


*264 


164 


*484 


.480 


.553 


162 


*530 


*539 


*415 




It will be seen from Che date in Table 17 that a surprising amount 



of variance in studenC teacher behavior was accounted for by addik:.^ 
descriptors of the situation in which they were behaving to the predic- 
tion scheme* Without exception, at least when dealing with the Word, 
Film, or Sluiulatlcn Tests independently, the amount of varlauce 
accounted for in criterion measures was increased when the situational 
descriptors were added to the prediction scheme* In some cases Che 
amount of variance accounted for was equivalent to Chat accounted for 
by the formal predictor measures, and in some cases it actually exceeded 
that accounted for by the formal measures* When added to the Word Test 
the situational descriptors accounted for as much of the variance in 
three measures (measures 4, 6 and 7) and more of the variance in two 
(measures 9 and 10) than did the subscales of the Word Test itself* 

The same was found to be the case with the Film Test, though for some- 
what different measures: as much variance was accounted for by che 
situational descriptors in measures 4, 8 and 10 and more in measures 
3 and 9. because of the generally low predictive power of the Simulation 
Test the situational descriptors accounted for variance equal in amount 
to the Simulation Test in two measures (measures 1 and 5) and more in 
five (measures 4, 8, 9, 10, and 11). It is interesting to notu that the 
three measures most susceptible to setting influence (measures 4,9 and 
10) were, respectively. Orientation to the Use of Positive Reinforcement, 
Teacher Approachableness, and Individual vs. Group Focus* Considering 
that the setting measures were few and only roughly conceived, these are 
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remarkable findings, and suggest yet another line of research to be under- 
taken if prediction to behavior in situation is to be pursued seriously* 

As expected, because of the procedure followed in selecting pre- 
dictors, the same gains in predictive power did not appear when the 
situational descriptors were added to the prediction schesie* 

An Analysis of the Results of the Replication Study Extended to 
Experienced Teachers 

Using the same observational base for the calculation of criterion 
measures, the same predictor measures, etc., the design of the replica- 
tion study with student teachers was extended to a sampling of exper- 
ienced teachers* These teachers were drawn from the same schools in 
which the student teachers taught and from the same grade levels* 

Table 18 contains the data that derived from this extension* Table 19 
contains a comparison of these data to those derived in the replication 
study with student teachers* 
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Table- 18* Per Cent of Variance Accounted for in Experienced Teacher 
Behavior Without Regard for Situational Factors 



Criterion Word Film Simulation Composite 



1 

2 

3 

4 

5 

6 

7 

8 

10 

11 

12 



400 


*319 


442 


*323 


601 


.366 


466 


.169 


342 


*251 


385 


*250 


365 


*262 


304 


*306 


366 


.080 


229 


.272 


408 


.295 



226 


.582 


243 


*651 


100 


.685 


300 


*556 


144 


.600 


258 


.499 


274 


.611 


074 


.434 


140 


.350 


030 


.409 


052 


.617 






Table 19. A Comparison of the Per Cent of Variance Accounted for in 
Student and Experienced Teacher Behavior Without Regard 
for Situational Factors 



yofdl Film Simulation Composite 

Ex^er . Stud. Exper. Stud. Exper. Stud. Exper. 



Criterion 



Stud. 



1 


.570 


.400 


2 


.379 


.442 


3 


.481 


.601 


4 


.287 


.466 


5 


.466 


.342 


6 


.344 


.385 


7 


.206 


.365 


8 


.264 


.304 


10 


.284 


.366 


11 


.419 


.229 


12 


.436 


.408 



784 


.319 


.133 


212 


.333 


.332 


194 


.366 


.359 


256 


.169 


.112 


438 


.231 


.188 


412 


.250 


.301 


328 


.262 


.375 


353 


.306 


.244 


203 


.080 


.023 


310 


.272 


.164 


340 


.295 


.162 



226 


.556 


.582 


243 


.378 


.651 


100 


.655 


.685 


300 


.309 


.556 


144 


.419 


.600 


258 


.675 


.499 


274 


.526 


.611 


074 


.545 


.434 


140 


.365 


.350 


030 


.480 


,409 


052 


.539 


.617 



Two general observations can be made about these data: 1) they 

tend to follow the same general pattern observed in the student data, 
in that the Word and Film Tests accounted for a greater portion of vari- 
ance than did the Simulation Test, and the composite measure accounted 
for slightly less variance than it did in the parent study, and 2) 
the Word and Film Tests were differentially ei.fective with the student 
and experienced teacher samples. By and large the Word Test was 
a more effective predictor with the experienced teachers and the 
Film Test was a more effective predictor with the students. Both 
were unexpected outcomes. In entering the study it was anticipated 
that prediction would be consistently better for experienced teachers 
than for student teachers because their experience would permit them 
to respond to the situations depicted in the tests in ways which 
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were similar to ways in which they had responded and/or tend to respond 
to comparable situations in the classroom. This expected relationship 
between background of experience and predictability of behavior obviously 

did not appear. 

Even more surprising, at least at first blush, was the finding that 
the Word Test was a better predictor of experienced teacher behavior than 
the Film Test. As initially conceived, the theory of testing from which 
the predictive measures derived led to the expectation that the Film 
Test would be superior. In retrospect it appears that the theory is too 
simple. It may be, for example, that the theory holds only for persons 
who have had a limited background of experience ^ classrooms , and 
thereby have only a limited backlog of concrete referents to bring to 
a testing situation like that presented by the Word, Film, and Simulation 
Tests. For these people, the concrete referents provided by the 
Film and Simulation Tests may be an advantage; |or persons yi^ a 
broad range of classroom experience the same referents may^ ^ a 
disadvantage , for they may limit their perception to a single situation 
which may in fact not be representative of the situations with which 
they generally deal. If this should be true then one would expect 
that the Word Test, with its more general class of referents, to be 
more effective as a predictor for experienced teachers. Whatever 
the eventual explanation may be, the results of the comparative 
study between student and experienced teachers suggests that an ap- 
proach to measurement that is maximally effective with one may not 
be maximally effective with the other. 



An Analysis of the Effects of Adding to the Prediction Scheme with 
Experienced Teachers Dtfscriptors of the Setting in which Criterion 
Measures Were Obtained 

The data reflecting the consequences of adding to the prediction 
scheme measures descriptive of the setting within vdiich experienced 
teacher behavior occurred are presented in Table 20. In making 

Table 20. Per Cent of Variance Accounted for in Experienced Teachers 
by Tests and Situational Descriptors 



Word T Film T 



Crit^^r- 

ion 

Measure 


Word 

Test 


A Sit- 
uation 
Meas ' s 


Film 

Test 


A Sit* 
uation 
Meas ' $ 


1 


.400 


.765 


.319 


.648 


2 


.442 


.692 


.333 


.580 


3 


.601 


.632 


.366 


.587 


4 


.466 


.725 


.169 


.495 


5 


.342 


.590 


.251 


.513 


6 


.385 


.827 


.250 


.651 


7 


.365 


.500 


.262 


.402 


8 


.304 


.546 


.306 


.464 


10 


.366 


.610 


.080 


.342 


11 


.229 


.519 


.272 


.572 


12 


.408 


.530 


.295 


.392 



Sim. T Compos. 
Simula** & Sit* Compo* A Sit* 
tion nation site of nation 
Test Meas*s Tests Mess's 



226 


.545 


.582 


.569 


243 


.420 


.651 


.581 


100 


.234 


.685 


.670 


300 


.597 


.556 


.557 


144 


.307 


.600 


.564 


258 


.642 


.499 


.429 


274 


.358 


.611 


.574 


074 


.298 


.434 


.481 


140 


.299 


.350 


.355 


030 


.247 


.409 


.287 


052 


.108 


.617 


.497 



these calculations the setting measures were used as predictors 
in the same way they were used in the replication study with student 
teachers . 

As with the student teacher data, a surprising amount of variance 
in experienced teacher behavior was accounted for by adding descriptors 
of the setting to the prediction scheme. As might be expected 
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proportionately more variance was accounted for when these measures 
were combined with the Film and Simulation tests chan when they were 
combined with the Word Test measures, but essentially the data repli- 
cate that obtained with student teachers. Taken together, the data on 
the effectiveness of situational descriptors as predictors leads to 
the obvious conclusion that if prediction to behavior in situation is 
to be undertaken seriously then a great deal of attention will need to 
be directed to the measurement and/or control of situational factors. 
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Chapter V 



SUMMARY AMD CONCLUSIONS 



Recently completed research by Schalock, Beaird and Simmons (1964) 
on the predictive power of tests which use motion pictures as test 
stimuli suggested that a methodology may now be at hand which will permit 
the prediction of teaching behavior in the classroom. Using student 
teachers as subjects, Schalock e^ al. were able to demonstrate multiple 
correlations of .69 to .87 between score-'i on a battery of situational** 
response tests (tests which use motion picture representations of class-* 
rcom situations as test stimuli) administered prior to student teaching 
and observational measures of their behavior in the classroom during 
student teaching. This represented an unusual accomplishment, for typi- 
cally studies in the behavioral sciences have not been able to account 
for more than 30 per cent of the variance in any criterion that has been 
predicted to, and when the criterion has been as complex as teaching 
behavior, the level of prediction has nearly always been less. In the 
Schalock, Beaird and Simmons study at least 30 per cent of the variance 
was accounted for in each of the 13 separate criterion measures used 
(concrete behavior of teachers in the classroom) and as much as 73 per 
cent of the variance was accounted for in some. 

Several factors, however, tended to limit the confidence that could 
be placed in the findings that came from the study. Two factors 
could have led to spuriously high correlations: 1) the final set of 

subscales used as predictois in the study were selected in a somewhat 
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unorthodox manner; and 2) the number of subjects tested in the study 
(40) was small and the number of predictor variables used (18) was 
large. Over and against these sources of error was 1) the fact that 
the measures used in the study were prototypic in nature and therefore 
probably not as powerful as such measures could ultimately become- 
and 2) the failure to control for situational factors that interact 
with or are thought to influence teaching behavior in the classroom. 

Given the data that derived from the study, and the many potential 
sources of error that accompanied them, a proposal was submitted 
immediately upon the completion of the study to the U.S. Office of 
Education for Its replication and extension. Three factors led to 
the second proposal: (1) the essentially unprecedented results obtained 

in the parent study, (2) the numerous potential or real sources of 
error in it, and (3) the desire to avoid the pitfalls of uncritical 
test adoption, that is, the desire to forestall the users of tests 
from moving too quickly to adopt the instruments developed in the 
study for use In their own programs of research or evaluation. Since 
these Instruments were new, and since the first predictive efforts 
with them were so promising, there was danger that the measures might 
be applied in areas where basis for their application did not exist, 
four major objectives guided the present study: 

(1) to replicate the parent study; 

(2) to extend the design of the parent study to experienced, 
primary grade teachers; 
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(3) to strengthen both replicetion studies by increesing the 
number of subjects used in each end including in thca 
measures of situational variables that affect predictive 
accuracy; and 

(4) to investigate the effects on prediction of deriving 
criterion meacures from behavioral samples of varying 
lengths. 

A fifth objective evolved as the study progressed* namely to 
strengthen the criterion measures used in it. This required extensive 
work on the observation system developed in the parent study* and led 
In part to a request for a b-months extension of the study. A by 
product of this extension is a monograph (see Attachment 1) that 
provides an overview of the observational system that .derived from the 
effort. The system is referred to generally as the Teaching Research 
System for the Description of Teaching Behavior in Context , and repre- 
sents the most exhaustive measure of teaching behavior that currently 
is available. 

A Siimmary of the R eplication Study 

Every effort was made to replicate the parent study in its exact 
detail. Subjects were drawn from the same population* the same predic- 
tive measures were used* criterion measures were equivalent though 
strengthened* and the same analyses were applied. 

Thirty-nine senior women* majoring in elementary education with 
specialization in the primary grades at cither Oregon State University 
or Oregon College of Education* served as subjects for the study. 
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Subjects were drs%m from the pool of students who did their student 
teschiog in the Winter end Spring terns of the 1965-66 ecsdenic year 
and the Fall and Winter terms of the 1966-67 academic year. Only stu- 
dents who volunteered to take part in the study and who did their 
student teaching within a 60-mile radius of Oregon State University 
were eligible for inclusion. 

Four predictor tests » varying on a continuum of atimulus and 
response complexity, were used in the study: 1) a traditional paper- 

and-pencil attitude scale, where the test stimulus was a statement 
describing an orientation to the teaching function and response was 
defined by agreement or disagreement to the statement (The Minnesota 
Teacher Attitude Inventory), 2) a situational- response test where the 
test stimuli were written descriptions of filmed classroom situations 
and response was defined by agreement or disagreement to statements 
made in relation to the situational descriptions (The Word Test), 3) a 
situational-response test where the test stimuli were motion picture 

sequences of classroom situations and response was defined as in (2) 

♦ 

above (The Film Test), and 4) a situational-response test where the 
test stimuli were also picture sequences of classroom situations but 
the response was free, i.e.^ the subject responded to the filmed situa- 
tion she were the teacher the situation (The Simulation Test). 

The predictor measures were administered in group settings at the 
close of the term that preceded the term in which the subjects did 
their student teaching. In contrast to the parent study, however, a 
totally random order of test presentation under supervised conditions 
was not followed: the Film Test and Simulation Test were always admin- 

istered under supervised conditions, and either the Word Test or the 



MTAI were administered under non-supervlscd conditions » i.c.» at home. 
Furthermore* the Film and Simulation Tests were always administered 
in a one-two order, and the MTAI or Word Test always in a three-four 
order. While the Film and Simulation Tests were always assigned their 
order of presentation randomly, and the Word Test and the MTAI were 
always assigned to the non-supervised condition randomly, the inabil- 
ity to follow a totally random assignment of test order represented a 
source of error in the data and a departure from the procedure followed 
in the parent study. 

Eleven measures descriptive of the interaction patterns of teachers 
and learners in the classroom served as criterion measures for the study. 
All were derived from the category descriptions of classroom interaction 
provided by the Teaching Research System for the Description of Teaching 
Behavior in Context (Schalock and Mlcek, 1968). Three features charac- 
terized the measures: 

1) They were theoretically relevant, i.e., they related to 
dimensions of the model of teaching behavior used as a 
guide to instrument development throughout the study, and 
as a consequence exhibited a close tie to the predictive 
instruments that were developed; 

2) They were complex In the sense that they represented a 
pooling of a number of conceptually related behaviors 
into a ratio or combination score. Theoretically this 
provided a more stable and comprehensive measure than 
would single classes of behavior; and 
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3) The measures took full advantage of the power of the 
observational system in the sense that they tied to 
(a) various classes of child behavior i Cb) the teacher a 
response to classes of child behavior, and (c) the 
child's response to the teacher's behavior. 

Eight of the eleven measures were comparable to those used in the parent 
study; three were new. These were added to replace the measures used in 
the earlier study that derived from rating scales. 

The same procedures were followed in making classroom observations 
as were followed in the parent study. Each subject was observed for 
nine 30-roinute periods during the final two weeks of her student teach- 
ing experience. Observations were made on days approximately one week 
apart by three different observers. Three 30-minute observations were 
made each day; two in the morning and one in the afternoon. During 
each observation period, subjects had primary teaching responsibilities 

in their rooms. 

A ^ Extension 01 ' Addition of Situational Data to the 

^t et^lctio n Scheme 

Situational factors affecting the behavior being predicted were 
not taken into account in the parent study. Factors such as unplanned 
events, composition of th- class, physical conditions within the class- 
room and the nature of the activity in which teacher and learners engaged 
were not controlled. Since these are likely to be significant deter- 
minants of teaching behavior, the present study attempted to include 
them in it. The aim of the present effort was to obtain prototypic 
measures of such factors and include them in the prediction achene as 



69 



control VAriabiea to ice if their Inclusion would significantly incresss 
the amount of variance accounted for in the criterion measures* 

Two sete of measures were used in this respect: 1) thone derived 

through interview with the teachers inuaediately after they had been 
observed, and 2) those derived from the records of classroom interaction 
made during the course of observation. Four global measures were derived 
from the interview data: 

1) A descriptor of the physical setting^, i.e., the space available 
per learner, the presence of individual desks or tables, heat, 
lighting, proximity to activity on the playground or in the 
halls, etc* { 

2) A descriptor of the administrative setting, i.e., the philoso- 
phy of the building principal as to the nature of desirable 

or undesirable classroom activity; 

3) A descriptor of the characteristics of the class, i.e., their 
socioaconimic status, the ratio of boys to girls, the number 
of habitually disruptive children in the class, etc.; and 

4) A descriptor of unusual or unplanned events that were 
disruptive to planned learning experiences. 

Four setting measures were also derived from the records of class- 
room interaction. These measures consisted simply of the occurence of 
behaviors that were disruptive to the class during the classroom obser- 
vations. The four measures used in this respect were: 

1) A score representing the total number of such instances, 

2) A score representing the number of such incidents that were 
non-academic in nature and directed toward the teacher. 
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3) A score representing the number of incidents of the same kind 
directed to other children, and 

4) The number of Instances that had an academic focus but which 
were sufficiently inappropriate in nature to cause them to be 
disruptive. 

By combining the four measures derived from the observation data and 
the four derived from the interview data, a total of eight setting or 
situational measures were available for use in the study as predictors. 

For purposes of analysis these were simply added to the set of formal 
predictors that derived from the MTAI, Word, Film and Simulation Tests* 

A Summary of Extension jn: The Repetition of Study , Including the 

Use of Situational Measures , with Experienced Primary Grade Teachers 

With one exception the same measures and procedures as outlined in 
the replication and extension of the parent study with student teachers 
were followed In the extension of the parent study to experienced teachers. 
The one exception occurred in relation to the time periods in which tests 
could be administered and observations could be made. In contrast to the 
rather rigid schedule of testing at the end of the term prior to student 
teaching and observation within the last two weeks of the student teaching 
experience, the experienced teachers could be tested and observed at any 
t ime . 

Subjects in the study were thirty-nine experienced primary grade 
teachers drawn from the school districts in which the student teachers 
who participated in the study did their student teaching, and in the 
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8«me proportion. Only those who volunteered for the project were included 
in it. No restrictions were placed upon length of teaching experience 
beyond having ttiught for at least one year prior to taking part in the 
study* The testing and observations required by the study were fitted 
to the convenience of the teachers within a given district and to the 
time schedule of project personnel. 

A Summary of the Investigation of the Effects on Prediction of Using 
Behavioral Samples of Differing Lengths in Obtaining the Criterion 
Measures 

The rationale underlying the inclusion of an investigation of 
this kind in the present study rests upon the fact that the study 
depends upon behavioral sampling for its criterion measures but as 
yet there is no conclusive evidence as to the length or number or 
distribution of behavioral samples needed in order to obtain stable 
or representative criterion measures. The plan of the investiga** 
tion was relatively simple: compare the magnitude of the correlations 

derived from the prediction scheme with criterion measures based 
upon 1 hour, 2 hours and 3 hours of observation time, respectively. 

This was done for both student and experienced teachers. Rather 
than pursue this plan, however, a straightforward comparative analysis 
of criterion measures was made to see if measures based upon 1, 2 
or 3 day samples of behavior differed from one another. The 3 day 
sample was used as a standard in the analysis. 
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Conclusions 



!• Were the results obtained by Schalockf beaird and Sionaons able to 
be replicated? Yes and no. Essentially the same results were obtained in 
the two studies with the Word and Film Tests f but different results were 
obtained with the Simulation Test. It will be recalled that in the parent 
study the Simulation Test was consistently the most powerful predictor of 
the three, whereas in the replication study it was consistently the least 
powerful. As would be expected, because of the decreased effectiveness 
of the Simulation Test as a predictor, the prediction scheme that combined 
the "best" subscale predictors from the three measures also decreased in 
its predictive effectiveness. 

What do these results mean for the prediction of teaching behavior 
in the future? What do they mean for test theory? On both counts they 
are both encouraging and discouraging. On the encouraging side is the 
fact that the Word and Film Tests maintained themselves as fairly adequate 
predictors of behavior in situation, giving rise thereby to hope that 
teaching behavior may in time become a fairly predictable phenomenon. 

On the discouraging side is the fact that the Simulation Test failed to 
replicate in its effectiveness. This not only casts doubt on the trust- 
worthiness of the measure that proved to be the most effective predictor 
in the parent study but also on the viability of the theoretical position 
underlying the study, that is, that as tests become more lifelike in their 
stimulus and response properties effectiveness of prediction should increase. 
Fortunately, these doubts may be more severe than they need to be, for there 
is reason to believe that the relative ineffectiveness of the Simulation 
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Tt8t in the present effort was e function of the coding scheme applied 
to It rather than the test Itself. Assuming this to be the case» it 
nay well be that neither the test nor the hypothesis need to be dis- 
carded. The better part of wisdom would seem to be to maintain the 
hypothesis, devise new tests or new scoring procedures, and undertake 
a series of studies designed to test the methodology thoroughly* 

2. Was the percent of variance accounted for in teaching behavior 
by the prediction scheme used in the Schalock., fiealrd and Simmons study 
Increased by including In the prediction equation measures of 
situational factors that affect teaching behavior? Unequivocally, yes! 
Without exception, at least when dealing with the Word, Film, or 
Simulation Tests Independently, the amount of variance accounted for 
in criterion measures was Increased when the situational descriptors 
were added to the prediction scheme. In some cases the amount of 
variance accounted for was equivalent to that accounted for by the 
formal predictor measures, and in some cases it actually exceeded 
that accounted for by those measures* For example, when tided to the 
Word Test the situational descriptors accounted for as much of the 
variance In three measures and more of the variance in two than did the 
subscales of the Word Test Itself. The same was found to be the case 
with the Film Test and the Simulation Test though In the latter case, 
because of the generally low predictive power of the Simulation Test, 
the situational descriptors accounted for equal variance in two 
measures and more in five. Considering that the setting measures were 
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were few and only roughly concaived these are remarkable findlnga, 
and suggest a critically needed line of research to be undertaken 
if prediction to behavior in situation is to be effective. 

3. Do the results obtained in (1) and (2) above with student 
teachers vary when the methodology is applied to experienced teachers? 
Essentially no. By and large the same level of relationship was 
found between predictor measures and the classroom behavior of ex- 
perienced teachers as was found between these measures and the class- 
room behavior of student teachers. Also, essentially the same results 
were obtained with experienced teachers when descriptors of the setting 
were added to the prediction scheme. The only major point of 
variance in the results obtained in the two studies was the finding that 
the Word and Film Tests were differentially effective with the student 
and experienced teacher samples. By and large the Word Test was a 
more effective predictor with the experienced teachers and the Film 
Test was a more effective predictor with the students. 

These results were essentially unexpected. In entering the 
study it was anticipated that prediction would be consistently better 
for experienced teachers than for student teachers because their 
experience would permit them to respond to the situations depicted 
in the tests in ways which were similar to ways in which they had 
responded to comparable situations in the classroom. This expected 
relationship between background of experience and predictability 
of behavior obviously did not appear. Even more surprising, at least 
at first blush, was the finding that the Word Test was a better predictor 



of cxperienc^id teacher behavior than the Film Teat. Aa initially 
conceived* the theory of. testing from which the predictive measurea 
derived led to the expectation that the Film Test would be superior* 

In retrospect it appears that the theory is too simple. It may be* 
for example, that the theory holds only for persons who have had 
a limited background of experience ^ classrooms , and thereby have 
only a limited backlog of concrete referents to bring to a testing 
situation like that presented by the Word* Film, and Simulation 
Tests. For these people* the concrete referents provided by the 
Film and Simulation Tests may be an advantage; for persons with a 
broad range of classroom experience the same referents may be a 
disadvantage for they may limit their perception to a single situation 
which may in fact not be representative of the situations with 
which they generally deal. If this should be true then one would 
expect the Word Test* with its more general class of referents* to 
be more effective as a predictor for experienced teachers. Whatever 
the eventual explanation may be, the results of the comparative 
study between student and experienced teachers suggests that an 
approach to measurement that is maximally effective with one may 
not be maximally effective with the other. 

4. Do the results obtained in (1), (2), and (3) above vary 
as the behavioral samples on which the criterion measures are based 
vary? This question is unable to be answered directly, but on the 
basis of indirect evidence the answer would appear to be yes. In 
order to answer the question in the form it was asked three separate 
analyses would have had to have been run on each of the first three 
questions* that is, each question would have had to have been analyzed 
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using criterion measures based on 2» and 3 days of observation 
respectively. Operationally, this would have required 528 regression 
runs to be made, a cumbersome and costly procedure. In an effort 
to short“*cut this process, and stixl obtain the essencial information 
desired on the relationship between length of behavioral sample and 
stability of criterion measure, a straightforward analysis of the 
differences obtained in criterion measures as a function of length 
of behavioral sample was undertaken. The rationale underlying the 
analysis was one of economy: if no differences were found in measures 
as a function of length of behavioral sample then not only could 
the tripling of regression runs be avoided but a smaller amount of 
data (1 or 2 days* data vs. that of 3 days*) be handled in preparing 
the needed regression runs. The criterion used in the analysis against 
which to compare differences was the 3 day behavioral sample. 

The data that derived from the analysis indicated that for both 
the student teacher and experienced teacher samples scores based 
on a single day's observation varied significantly from scores based 
on three days of observation. This was not the case, however, for 
scores based upon two days of observation. For both samples observed 
in the study no statistically significant differences were noted 
between scores based on two days of observation a.id those based on three 
days of observation. Thus, for purposes of the present study, it was 
concluded that the utilixation of two days of observation, when each day’s 
observation time was based upon three one-half hour observational 
gettings, provides as adequate or stable a picture of teacher behavior 
as do three days of observation. 






77 









While these data provided a basis for firm decision sulking in 
the present study f and supported the use of the two day sample in 
the parent studyf they are not sufficient in and of themselves to 
answer the full range of questions that need answering in relation 
to the issue of behavior sampling* They do indicate that a 2 or 
3 day sample is different from a single dayf but how would a 2 day 
sample compare to a five or ten day sample? More importantly f what 
difference would it make if the basis for behavioral sampling were 
activities or subject matter topics or stages in the development 
of topics? These and other questions ultimately must be answered 
if the study of behavior in situation is to be undertaken seriously. 
The results of the present study represent a start in this direction, 
but a great deal more needs to be done. 
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Mardi 3» 1966 



MEMOtAMDUM 



TO: Otan Ztran 

nOM: Dtl Schalock 

El: A ■•■Inar in which participants in the prediction of claaaroon 

behavior project nay enroll and receive 1 (one) hoor of credit 

As you know. Dr. Jin Beeird and 1 are replicating the research that 
ve did several years ago with student teachers in the prinary grades fron 
OSU and OCE. You will recall that the researdi calls for the students 
to take a aeries of 4 tests prior to their student teaching experience 
and to be observed in the classrooa for three half-hour periods on three 
separate days during the last few weeks of their student teaching exper- 
ience. Observations are focused upon nanagensnt behavior and involve 
the systeaatic description and recording of all nanageaent interdiange 
between teacher and children. 

Thus far* 18 student teachers fron OSU have participated. Spring 
tern students soon will be contacted about their interest in the study* 
and 1 anticipate that another 15 or so students will becons involved. 

Ve also have approached the cooperating teachers about their participa- 
tion In the project and approxinately 2/3 of then wish to take part. 

In order to sake participation in the research as neaningful as 
possible* Dr. Jack Hall and 1 worked out a plan last fall whereby ve 
prepare for each participant in the project **behavior profiles" for each 
half-hour chat they are observed and discuss these with then at the end 
of the year within the fraaevorfc of a 1 or 2 day seninar nceting. Ve 
anticipated that both student and cooperating teachers would attend the 
seninar* though it would be a voluntary natter* and that the discussion 
would center around individual profiles* the contrast between various 
student teacher and various cooperating teacher profiles* and the rela- 
tionship between behavior patterns and situational factors. Also* 
individual teacher behavior will be related generally to a nodel of 
teaching behavior that has been developed in relation to the project. 

It is also understood that I an to plan the seninar in cooperation 
with interested nenbers of the Departnent of Elensntary Education staff. 
This has awaited the conpletion of sone writing on ay part* but will 
take place during spring tern. 

In discussing the possibility of the seninar with Dr. Hall and his 
staff it was suggested that an hour of credit be attached to it with 
the thought that this would represent a fomal record of the students* 



«nd cooperating taachar*s participation in the project, aa iiell aa 
aenring aa aa added indueeaent to participation. While the face-te»face 
contact vithin the aeainar itaelf would not conatitute a aufficient baaia 
for an hour of credit, it vaa thought that the half day required in exanina- 
tiott, the three daya required in obaervation, and the infomal coatacta 
throughout a tern with project ataff would cohbina with the day of face- 
to-face contact in diacuaaion to provide a legitimate baaia for 1 hour of 
credit. All aubjecta have been epproached from thia atandpoint and are 
expecting to regiater for the hour of credit apring term. Saturday, 

May 28th, haa been aet tentatively aa the date for meeting with the atudent 
teachera while Saturday, June 4, haa been aet tentatively aa the date to 
meet with the cooperating teachera. 

la order that the participanta may regiater for the aeainar, a courae 
number and claaa carda will have to be aet up and available at apring term 
regia tret ion. I truat that thia ia poaaible, and I hope that it will not 
cauae you inconvenience to arrange it at thia time. Participation in the 
project haa aeemed to be a meaningful experienca to the atudenta and I 
think that they are looking forward to the diacuaaion within the aeminar. 

If I may be of further help in arranging the aeainar, pleaaa call 
upon me. 



cc: Dr. Jack Ball 
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Z Teacher Acts 



TEACHING OPERATIONS USED IN INSTRUCTION 




Figure 5, A graphic representation of the proportion of Instructional acts 

used by a teacher uhlch fall Into one of the three major compoufeuts 
of Instruction. The component analysis represents the first level 
of analysis used In classifying TEACHING OPERATIONS c 
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Figure 7. A graphic representation of the proportion of facllltpr^r acts by the 
teacher which fall Into one of the functions served by teaching oper. 
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Figure 10.- A graphic rapresantatlon of the proportion of faellltory or nanageneat acts used 
by a teacher, clesolfled according to the Instructional noves they represent. 



PROPORTION OF CENSORSHIP MOVES IN RELATION TO ALL INSTANCES 
OF CENSORSHIP OF LEARNER BEHAVIOR 

Developmental Evaluation 



1 




Figure 11. A graphic representation of- the various censorship moves 
used by a teacher. The Developmental Evaluation graph 
represents the proportion of negative evaluative moves 
used to evaluate a learner’s academic performance; the 
Facilitory Evaluation graph represents the proportion of 
negative evaluative moves used- to censor or discipline 
a learner who is either out~of~f ocus , i.e., a learner who 
is not in the same FOCUS as the teacher, or in-focus , but 
behaving inappropriately . 



IM' 




an less than more than 

individual half of the half of the 
learner group group 



Figure 12. A graphic representation of the. proportion of 
all teacher acts by recipient (the target 
audience the teacher sends the message to). 
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Figure 13. A graphic representation of the instances of 
affect in the classroom for both teacher and 
learners. 
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